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ABSTRACT 


We  propose  new  nonparametric  statistical  tests  to  identify  whether  each  element 
in  a  sequence  of  independent  multivariate  observations  is  drawn  from  a  common 
probability  distribution  or  if  some  distributional  change  has  occurred  over  the  course  of 
the  sequence.  Each  test  is  formulated  using  matching  techniques  based  on  distances 
between  observations.  These  tests  are  capable  of  detecting  changes  of  quite  general 
nature,  and,  unlike  most  similar  tests,  they  require  no  distribution  assumptions  or  any 
prior  separation  of  the  data  into  hypothetical  pre-  and  post-change  subsets.  We  derive  a 
central  limit  theorem  for  one  of  the  tests  and  an  exact  distribution  for  another.  A  third 
culminating  test,  which  is  a  cumulative  sum  of  statistics  on  a  collection  of  orthogonal 
matchings  associated  with  the  observation  sequence,  exhibits  noteworthy  power  to  detect 
whether  a  distributional  change  has  occurred.  We  examine  the  performance  of  the  tests 
by  computer  simulation  and  compare  results  to  a  state-of-the-art  parametric  competitor. 
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EXECUTIVE  SUMMARY 


Given  a  sequence  of  observations,  has  a  change  occurred  in  the  underlying 
probability  distribution  with  respect  to  observation  order?  This  question  arises  in  a 
variety  of  applications,  including  quality  control,  machinery  health  diagnosis, 
biosurveillance,  and  image  analysis.  This  problem  is  encountered  throughout  statistical 
literature  and  is  often  referred  to  as  “the  change-point  problem,”  where  “change  point” 
refers  to  the  index  of  the  first  observation  for  which  the  underlying  probability 
distribution  is  different  from  that  of  previous  observations.  Detecting  change  points  in 
high-dimensional  settings  is  challenging,  and  most  change-point  methods  for  multi¬ 
dimensional  problems  rely  heavily  upon  distributional  assumptions  such  as  multivariate 
nonnality  or  the  use  of  observation  history  to  model  probability  distributions.  In  practice, 
such  strong  distributional  assumptions  are  often  invalid  and  can  lead  to  poor  detection 
performance,  and  useful  observation  histories  are  often  unavailable.  Also,  most  change- 
point  methods  are  applicable  only  to  changes  of  a  specific  type  (for  example,  an  abrupt 
change  in  distribution  mean)  when  in  many  cases  one  is  interested  in  detecting  more 
general  types  of  change  as  well  (for  example,  changes  in  scale  or  gradual  changes). 

We  propose  new  nonparametric  statistical  tests  to  detect  the  presence  of  a  change 
point  in  a  sequence  of  multivariate  data  based  on  the  graph-theoretic  concept  of 
matching.  Each  test  requires  only  the  assumption  of  some  reasonable  function  to 
measure  dissimilarity  between  observations.  We  state  the  change-point  problem  by 
representing  the  observation  set  as  a  complete  graph  in  which  each  observation  is  a  vertex 
and  each  pair  of  vertices  is  connected  by  an  edge  whose  weight  is  the  dissimilarity 
between  the  two  vertices.  Then  we  pair  observations  together  in  such  a  way  as  to 
minimize  the  total  cost  of  the  collection  of  pairs;  this  collection  of  pairs  is  called  a 
matching.  Our  statistical  tests  for  a  change  point  use  the  fact  that  if  a  change  has 
occurred  with  respect  to  order  in  the  underlying  distribution  of  a  sequence  of 
observations  then  the  sequence  labels  of  the  pairs  in  the  matching  are  closer  together  on 
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average  than  if  no  distributional  change  has  occurred.  By  considering  not  just  the  lowest- 
cost  matching  but  rather  several  low-cost  matchings,  we  achieve  considerable  power  to 
detect  whether  there  is  a  change  point  in  a  sequence  of  observations. 

We  examine  the  perfonnance  of  these  tests  by  simulation  in  various  change-point 
settings  considering  different  underlying  probability  distribution  family,  dimensionality, 
change  point  location,  change  parameter  (distribution  location  or  scale),  type  of  change 
(abrupt  or  gradual),  and  magnitude  of  change.  Each  test  demonstrates  power  to  detect  a 
change  point  at  fixed  false  alann  rate  in  every  case  examined.  The  most  powerful  of 
these  tests  is  the  Ensemble  Sum  of  Pair-Maxima  (ESPM)  test,  which  computes  the 
cumulative  sum  of  the  larger  sequence  labels  among  all  pairs  in  a  collection  of  low-cost 
matchings  and  measures  the  deviation  of  this  sum  from  its  expected  value.  The  ESPM 
test  has  change  detection  power  comparable  to  a  state-of-the-art  parametric  competitor, 
the  maximum  likelihood  ratio  test  of  James,  James,  and  Siegmund  (JJS),  even  when  the 
parametric  assumptions  for  that  test  are  met.  When  those  assumptions  are  not  met,  the 
ESPM  test  retains  noteworthy  power  to  detect  a  change  point,  while  the  false  alann  rate 
of  the  JJS  test  increases. 


I.  INTRODUCTION 


The  problem  of  identifying  a  change  in  a  stochastic  process  is  often  referred  to  as 
“the  change-point  problem”  and  has  been  a  subject  of  enduring  interest  in  statistical 
literature.  A  simple  statement  of  this  problem  is,  “Given  a  sequence  of  observations,  has 
a  change  occurred  in  the  underlying  probability  distribution  with  respect  to  observation 
order?”  This  problem  arises  in  a  variety  of  applications,  such  as: 

•  Quality  control.  Samples  are  taken  from  a  particular  manufacturing  process, 
perhaps  over  time  or  across  different  stages  in  the  process,  that  carry  information 
regarding  the  quality  of  the  end  product.  Change-point  methods  are  used  to  indicate  if, 
where,  or  when  the  process  departs  from  an  “in-control”  condition. 

•  Machinery  diagnostics  and  fault  detection.  Consider  a  complex  machine  for 
which  various  measurements  are  taken  during  its  operation  that  provide  an  indication  of 
machine  health.  A  change  in  the  distribution  of  these  measurements  with  respect  to  time 
might  indicate  some  form  of  health  degradation. 

•  Biosurveillance.  Suppose  occurrences  of  a  particular  disease  are  cataloged  by 
geographic  location  and  monitored  over  time.  Change  point  methods  may  be  used  to 
detect  whether  a  disease  outbreak  has  occurred. 

•  Image  analysis.  Consider  a  sequence  of  images  of  the  same  scene  taken  at 
different  times:  for  example,  satellite  images  of  some  geographic  area  or  magnetic 
resonance  imaging  scans  on  an  individual.  Evidence  that  the  scene  is  changing  in  some 
significant  way  may  be  found  by  means  of  change  point  analysis. 

This  research  is  about  detecting  whether  change  has  occurred.  Specifically,  we 
investigate  nonparametric  tests  to  detect  change  points  of  very  general  nature  in 
multivariate  data.  Cases  of  “very  general  nature”  include  those  of  abrupt  or  gradual 
changes  in  the  mean  of  a  distribution,  or  changes  in  its  scale.  While  many  real-life 
processes  exhibit  gradual  change,  fairly  little  investigation  has  been  done  in  the  area  of 
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detecting  gradual  changes  in  multivariate  data.  Furthermore,  a  relatively  small  body  of 
work  exists  proposing  nonparametric  solutions  to  multivariate  change  detection 
problems,  although  interest  in  this  area  appears  to  be  growing. 

We  examine  nonparametric  methods  that  rely  on  matching,  which  involves  the 
pairing  together  of  observations  based  on  some  measure  of  dissimilarity.  Existing 
statistical  applications  of  matching  techniques  include: 

•  assessing  sensor  accuracy  in  test  and  evaluation  of  radar  and  joint-tracking 
systems  by  pairing  detected  objects  with  truth  objects, 

•  comparing  the  accuracy  of  different  methods  for  estimating  the  locations  of 
impact  points  in  munitions  testing  by  pairing  estimated  locations  to  “ground  truth,”  and 

•  pairing  subjects  based  on  similarity  measures  for  clinical  trials  and  observational 
studies, 

to  name  a  few.  In  this  paper,  we  introduce  new  methods  of  this  type  that  prove  to  be 
powerful  to  detect  change  over  a  wide  range  of  alternative  hypotheses. 

Our  work  is  organized  as  follows:  In  Chapter  II  we  classify  the  field  of  change- 
point  problems  into  its  two  main  categories,  sequential  and  non- sequential  techniques, 
with  a  formal  discussion  of  how  problems  in  each  category  are  framed.  We  then 
summarize  our  review  of  the  literature  in  this  field,  with  particular  emphasis  on 
nonparametric  approaches,  followed  by  a  graph-theoretic  overview  of  matching.  The 
chapter  concludes  with  a  review  of  the  most  recent  work  on  this  problem  based  on  non- 
bipartite  matching  (defined  in  the  next  chapter),  which  is  our  primary  area  of  interest 
here.  In  Chapter  III,  we  propose  new  statistical  tests  based  on  non-bipartite  matching. 
First,  we  introduce  the  Sum  of  Pair-Maxima  (SPM)  test  and  the  Non-Bipartite 
Accumulated  Pairs  (NAP)  test,  and  develop  the  supporting  theory  for  these  tests  in  some 
detail.  Of  primary  importance  are  the  proof  of  a  central  limit  theorem  for  the  SPM  test 
and  the  derivation  of  the  exact  distribution  for  the  NAP  test.  Then  we  introduce  the 
dominant  test  of  this  paper,  the  Ensemble  Sum  of  Pair-Maxima  (ESPM)  test,  which  is  an 
extension  of  the  SPM  test  that  involves  extracting  additional  change-point  infonnation 
from  orthogonal  matchings.  Finally,  we  present  the  Bipartite  Accumulated  Pairs  (BAP) 
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test  as  an  application  of  bipartite  matching  techniques  to  change-point  problems  where 
some  observation  history  is  available.  Chapter  IV  demonstrates  the  perfonnance  of  the 
SPM,  NAP,  and  ESPM  tests  by  means  of  a  simulation  study.  The  power  of  these  tests  is 
compared  to  a  state-of-the-art  parametric  competitor,  the  maximum  likelihood  ratio  test 
of  James,  James,  and  Siegmund  (1992),  for  various  cases  including  different  underlying 
distributions,  different  types  and  magnitudes  of  change,  and  different  dissimilarity 
measures.  Chapter  V  summarizes  our  findings  and  outlines  opportunities  for  further 
research  in  this  field. 
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II.  PROBLEM  BACKGROUND  AND  LITERATURE  REVIEW 


A.  PROBLEM  FORMULATION 

This  research  addresses  the  following  specific  question:  Given  a  sequence  of 
independent  multivariate  observations,  is  the  sample  statistically  homogeneous?  In  other 
words,  has  the  underlying  probability  distribution  from  which  the  observations  were 
drawn  remained  constant  or  has  it  changed ?  As  discussed  in  the  previous  chapter,  this 
problem  emerges  in  a  wide  variety  of  applications  and  is  traditionally  referred  to  as  “the 
change-point  problem.” 

1.  Change  Points 

We  define  the  term  change  point  as  follows:  Given  a  sequence  of  independent 
random  variables  (JpX,,...,  Jj,),  let  F.  denote  the  probability  distribution  of  X r 

Then  an  integer  te{2,...,N }  is  a  change  point  with  respect  to  measure  6  if 

F1=F2=  —  =  FT-1>  Fr-\^Fr  and  <5  ,  F} )  -  max^ j}  6  [Fk  ,  F} )  is  strictly  positive 

over  j  e  {r, . . . ,  N} ,  where  6  is  a  measure  of  distance  between  two  probability 
distributions.  The  Fj  (  j  >  r  )  are  not  necessarily  distinct  from  one  another;  for  example, 

the  distribution  change  at  r  may  be  associated  with  an  abrupt  mean  change,  or  “mean 
jump.”  Or  Fj  may  some  simple  function  of  j  >  r  ;  for  example,  the  distribution  change 

beginning  at  r  may  be  associated  with  a  gradual  mean  change,  or  “mean  drift.”  More 
complicated  forms  for  F.  are  allowed  as  well.  A  formulation  of  the  general  change-point 

problem  in  a  hypothesis-testing  framework  with  respect  to  observations  Xl,X2,...,XN 

consists  of  defining  null  hypothesis 

(2.1)  H0:  Fl=F2=-  =  FN 

and  corresponding  alternative  hypothesis 
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Hl :  There  exists  an  integer  r,  2  <  r0  <  r  <  rx  <  N,  such  that 
(2.2)  Fl=F2=-  =  FT_1,  Ft_x  *  Fr ,  and 

6 (fy ,Fj)~  ma\ke{T  j}  6 (Fk ,  F )  is  strictly  positive  over  j  e  {r, . . F}. 

Usually  we  take  r0  =  2  and  tx=  N  but  in  some  cases  may  wish  to  restrict  the  change 
point  to  a  narrower  interval. 

An  important  taxonomy  exists  within  the  family  of  change-point  problems;  we 
present  a  particular  classification  scheme  here,  similar  to  Brodsky  and  Darkhovsky 
(1993)  and  Basseville  and  Nikiforov  (1993),  to  serve  as  a  concept  map  of  sorts,  to 
provide  a  framework  for  review  of  the  literature  in  this  field,  and  to  make  clear  the 
classification  of  the  problem  for  which  we  offer  solutions. 

2.  Taxonomy  of  Change-Point  Problems 

Change-point  problems  can  be  classified  into  the  two  broad  categories  of 
sequential  and  nonsequential  analysis.  Sequential  analysis  involves  detecting  the 
occurrence  of  a  change  while  monitoring  a  system  “on-line;”  that  is,  data  collection  is 
ongoing  in  time  and  analysis  is  performed  sequentially  as  the  data  set  is  updated.  For 
such  cases,  the  null  hypothesis  is  tested  over  and  over  again  as  new  data  are  added  to  the 
set  of  observations  as  follows: 

1)  With  observations  Xl,...,Xt_1  on  hand,  add  Xt. 

2)  Test  for  a  change  point  in  {2 

a)  If  a  change  point  is  detected,  perform  some  predetennined  action. 

b)  If  not,  go  back  to  step  (1)  with  Xv...,Xt  on  hand. 

A  classic  application  of  sequential  analysis  is  in  the  arena  of  quality  control,  where  some 
particular  process  is  ongoing  in  time  and  process  outputs  are  sampled  and  tested  in 
sequence  to  identify  if  some  undesirable  change  in  the  process  has  occurred. 
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In  contrast,  non-sequential  analysis  involves  the  examination  of  a  finite  sequence 
of  data  “off-line”  with  the  purpose  of  identifying  whether  or  not  some  change  has 
occurred  during  the  observation  period.  Non-sequential  analysis  can  be  divided  further 
into  the  cases  of  a  single  test  or  simultaneous  tests.  For  a  single  test,  one  considers  a 
sequence  of  multivariate  observations  (Xx,X2,...,XN}  and  a  known  or  assumed  time 

tg{2,...,A}  and  then  tests  the  null  hypothesis  H0 :  FX=F2=--=FN  against  the 
alternative  Hx :  Fx  =  F2  =  ■  ■  •  =  Ft_x  ^  Ft=  Fr+l  =■■■  =  FN  (or  perhaps  a  one-sided 

alternative).  One  example  of  this  type  of  problem  is  the  clinical  trial  scenario,  where  two 
groups  of  subjects  are  drawn  from  some  common  population,  one  group  is  administered  a 
treatment  and  the  other  a  placebo,  and  the  problem  is  to  test  whether  the  treatment  has 
some  particular  effect.  In  the  single  test  framework,  N  specimens  are  split  into  a  control 
and  a  test  group  with  { X1,...,XT_1 }  and  {XT,...,XN}  being  the  two  associated  sets  of 

observations  (the  order  of  observations  within  groups  does  not  matter  for  this  case),  and  a 
single  hypothesis  test  is  performed  regarding  r  (hence  the  name  “single  test”). 

For  simultaneous  tests,  no  specific  candidate  for  change  point  r  is  assumed;  the 
null  hypothesis  (2.1)  is  tested  against  alternative  (2.2)  as  stated.  Such  tests  are 
simultaneous  in  the  sense  that  they  involve  testing  all  possible  change  points 
simultaneously.  An  example  of  such  a  test  relates  to  machinery  health  management: 
Consider  a  military  aircraft  with  an  on-board  device  that  records  various  measurements 
on  the  aircraft  with  respect  to  time  during  flight;  these  measurements  are  known  or 
believed  to  be  indicators  of  aircraft  health.  Assume  for  the  sake  of  this  example  that  the 
observations  are  independent  with  respect  to  time.  Let  (Xx,X2,...,XN)  be  the  sequence 

of  these  observations.  When  the  aircraft  returns  from  its  mission,  these  observations  are 
analyzed  for  evidence  of  health  degradation.  That  is,  do  the  observations  provide 
evidence  of  a  change  from  some  “healthy”  distribution  to  a  “less  healthy”  one?  If  so, 
may  one  infer  when  the  degradation  occurred  (or  began  to  occur)? 

Machinery  health  diagnosis  and  prognosis  problems  are  strong  motivations  for 

our  research  effort;  consequently,  the  focus  of  this  research  is  non-sequential 

simultaneous  testing.  Our  literature  survey  finds  that  powerful  nonparametric  tests  for 
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multivariate  change-point  problems  are  not  abundantly  available,  particularly  in  the  non¬ 
sequential  simultaneous  testing  case.  We  now  proceed  to  review  several  parametric  and 
nonparametric  approaches  to  univariate  and  multivariate  change-point  problems, 
concluding  with  a  discussion  of  the  graph-theoretic  concept  of  matching  and  its 
application  to  such  problems.  This  will  set  the  stage  for  our  introduction  in  the  next 
chapter  of  new  matching-based  solutions  to  the  change-point  problem. 

B.  PARAMETRIC  APPROACHES 

1.  Univariate  Case 

The  two-sample  /-test,  described  in  Tanis  and  Hogg  (2008),  is  perhaps  the  most 
widely  known  test  for  heterogeneity,  though  it  generally  is  not  presented  in  change-point 
terms.  It  is  one  of  the  first  tests  introduced  in  an  undergraduate  statistics  course,  and  it 
usually  is  framed  as  a  test  for  a  difference  in  the  mean  of  two  samples.  While  it  is  widely 
applicable,  it  has  the  obvious  limitations  that  1)  it  applies  only  the  univariate  case,  2)  it 
assumes  the  underlying  distributions  are  normal,  and  3)  it  only  tests  for  differences  in 
distribution  means. 

In  quality  assurance,  the  classic  univariate  sequential  test  for  a  change  point  is  the 
cumulative  sum  (CUSUM)  test  introduced  by  Page  (1954).  Others  include  the  sequential 
/-test  (Rushton,  1950),  Geometric  Moving  Average  (GMA)  or  Exponentially  Weighted 
Moving  Average  (EWMA)  procedures  (Roberts,  1959),  and  Bayes-type  procedures 
(Girshick  and  Rubin,  1952;  Shirayev,  1963).  These  procedures  are  powerful  in  many 
settings  and  can  be  applied  as  non-sequential  or  sequential  change-point  tests.  Of  course 
these  tests  are  also  limited  to  univariate  cases,  although  some  multivariate  extensions 
exist.  Additionally,  they  require  assumptions  about  the  underlying  data  distribution 
(normality  is  usually  assumed). 

2.  Multivariate  Case 

The  multivariate  analog  of  the  two-sample  /-test  is  Hotelling's  two-sample  T2  test 
(Hotelling,  1931),  which  detects  differences  in  the  mean  vectors  of  two  multivariate 
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normal  samples.  Hotelling’s  T2  statistic  is  a  non- sequential  single  test.  Sequential 
multivariate  parametric  tests  include  multivariate  extensions  of  CUSUM  procedures 
(Basseville  and  Nikiforov,  1993;  Runger  and  Testik,  2004)  and  EWMA  procedures 
(Lowry  et  al,  1992;  Prabhu  and  Runger,  1997). 


A  change-point  test  by  James,  James,  and  Siegmund  (hereafter,  “JJS”)  (1992) 
associated  with  an  abrupt  change  in  the  mean  of  a  multivariate  normal  distribution 
interests  us  particularly.  JJS  is  a  non- sequential  simultaneous  test — the  area  of  our 
interest  in  this  work — and  it  serves  as  a  powerful  competitor  to  methods  we  present  later. 
Given  multivariate  observations  Xx, X2,...,XN,  JJS  uses  the  modified  likelihood  ratio 
test  statistic 


(2.3) 


JJS 


max 

*„<*<*! 


N 

k(N-k) 


UA 


~sA 


k  k 

where  Sk  =  E*,>  u>  =  ^  A. A/ ,  and  k0  and  kx  are  the  lower  and  upper  limits, 

i= l  ;= l 

respectively,  of  the  interval  possibly  containing  a  change  point.  The  actual  likelihood  test 
is  based  on  the  case  k0  =  1  and  kx  =  N  —  1  (that  is,  the  change  point  could  be  anywhere 
in  the  observation  sequence),  but  the  test  provides  the  flexibility  to  limit  the  search  for  a 
change  point  to  a  subinterval  of  (l,  N  —  l) .  JJS  show  that  the  tail  probabilities  for  this  test 
can  be  well-approximated  by 


P(Tns  >x)^ 


(2.4) 


r(p/2) 


Nx 


p/2 


(i-x) 


(N-p-i)!2 


V 


V/2 


i(l  —  t)(l  —  x) 


dt, 


where  k0l  N  —>t0  and  kx  /  N  — >•  tx  as  N  — >  oo  ,  0  <  t0  <  tx  <  1 ,  and 


(2.5) 


2 

?exp 


with  $  being  the  standard  nonnal  cumulative  distribution  function.  The  integral  term  in 
(2.4)  is  computed  satisfactorily  by  numerical  methods.  JJS  present  a  heuristic 
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modification  to  improve  their  tail  probability  approximation;  we  do  not  describe  those 
details  here,  but  we  do  utilize  the  modification  of  the  JJS  test  for  comparison  purposes 
later. 

A  drawback  to  all  these  parametric  approaches  to  change-point  detection  is  that 
they  are  not  necessarily  robust  across  different  underlying  data  distributions.  In 
particular,  the  assumption  of  multivariate  normality  in  many  cases  proves  to  be  difficult 
to  justify.  We  proceed  to  review  some  existing  nonparametric  approaches  to  the  change- 
point  problem. 

C.  NONPARAMETRIC  APPROACHES 

1.  Univariate  Case 

A  classic  nonparametric  test  to  determine  whether  two  univariate  random  samples 
come  from  the  same  population  is  the  Mann- Whitney  test  (Mann  and  Whitney,  1947), 
also  known  as  the  Wilcoxon  rank  sum  test  (Conover,  1999).  Let  A,  =  {Aj,...,Xffl}  and 

A2  =  {*„«.-  ,  Xm+n }  be  two  sets  of  observations  with  all  X.  being  members  of  some 
ordered  set.  Assign  ranks  (or  midranks  in  the  case  of  ties)  to  the  observations  with 
respect  to  set  ordering  and  let  R{X.)  denote  the  rank  of  observation  X r  Intuitively,  one 
would  expect  that  if  the  observations  in  A,  tend  to  be  smaller  than  those  in  A2  then  the 
ranks  of  the  observations  in  Aj  would  be  smaller  on  average  than  the  ranks  of  those  in 
At  To  determine  whether  A,  and  A2  are  drawn  from  the  same  population  one 
computes  the  Mann- Whitney  test  statistic,  which  is  simply 

m 

(2-6)  7mw=E*(V). 

i= 1 

Let  F1  and  F2  be  the  distribution  functions  corresponding  to  the  observations  in  A,  and 
A2 ,  respectively,  and  let  Y  ~  Fl  and  Z  ~  F2 .  Then  the  hypotheses  associated  with  the 
Mann- Whitney  test  may  be  stated  as  follows. 
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H0 :  P(Y>Z  =0.5; 

(2.7)  V  ’ 

Hx\  P(Y>Z)*  0.5. 

For  small  samples,  quantiles  of  TMW  are  found  in  tables  or  by  using  standard  functions  in 
statistical  software  (such  the  “qwilcox”  function  in  R  (2005)).  For  large  samples  Tuw  is 

asymptotically  normal.  While  the  Mann-Whitney  test  is  consistent  against  mean 
difference  alternatives,  it  is  not  sensitive  to  other  types  of  differences  (for  example, 
differences  in  scale). 

Another  rank-based  non-sequential  single  test,  which  is  consistent  against  a 
broader  range  of  alternatives  but  is  less  powerful  than  the  Mann-Whitney  test  (Conover, 
1999),  is  the  Wald-Wolfowitz  runs  test  (Wald  and  Wolfowitz,  1940).  Observation  ranks 
are  computed  as  above,  but  this  time  the  ranks  are  collected  into  runs  of  consecutive 
ranks  that  come  from  the  same  group  (either  A,  or  A2 ).  The  test  statistic  is  simply  the 

number  of  runs  in  the  collection;  a  relatively  small  number  of  runs  indicates  that  the  two 
samples  are  from  different  distributions.  Like  the  Mann-Whitney  test,  the  Wald- 
Wolfowitz  test  is  asymptotically  nonnal. 

Two  other  tests  that  are  consistent  against  any  type  of  difference  that  might  exist 
between  underlying  distributions  are  the  Kolmogorov-Smirnov  test  and  the  Cramer-von 
Mises  test.  If  Sx  and  S2  are  the  empirical  distribution  functions  for  the  observations  in 

A,  and  A, ,  respectively,  then  the  Kolmogorov-Smirnov  test  statistic  is  given  by 

(2.8)  Tks  =  sup|S1(x)-£2(x)| 

X 

and  the  Cramer-von  Mises  test  statistic  is  given  by 

(2-9)  rc.M=— ^  £  [S.W-^Wf- 

{m  +  n)  xeA|UA, 

In  other  words,  these  tests  statistics  evaluate  the  supremum  norm  and  L  nonn, 
respectively,  of  the  difference  between  two  empirical  distribution  functions.  Large 
values  of  these  statistics  are  evidence  that  A,  and  A2  are  drawn  from  different 

probability  distributions. 
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These  four  particular  tests  broadly  represent  the  two  primary  techniques  used  for 
nonparametric  change-point  detection  both  in  univariate  and  multivariate  cases: 
techniques  based  on  rank  permutations  (such  as  Mann- Whitney  and  Wald-Wolfowitz) 
and  tests  based  on  distribution  function  estimation  (such  as  Kolmogorov-Smirnov  and 
Cramer-von  Mises).  The  new  methods  we  present  in  the  next  chapter  are  grounded  in 
rank  permutation  arguments. 

Nonparametric  sequential  tests  include  generalizations  of  CUSUM  procedures 
such  as  those  introduced  by  Bhattacharya  and  Frierson  (1981)  and  Gordon  and  Pollock 
(1995).  These  procedures  apply  to  the  univariate  case  only;  no  extensions  of  these  tests 
to  the  multivariate  case  have  yet  been  proposed  (Fricker  and  Chang,  2009). 

2.  Multivariate  Case 

a.  General  Approaches 

Multivariate  extensions  do  exist  for  both  the  Kolmogorov-Smirnov  and 
Cramer-von  Mises  tests.  For  example,  Bickel  (1969)  extends  the  Kolmogorov-Smirnov 
test  to  iC  by  defining  multivariate  rank  vectors,  computing  empirical  distribution 
functions  with  respect  to  within-group  multivariate  ranks,  and  then  evaluating  the 
supremum  norm  on  the  difference  of  the  two  empirical  distribution  functions  as  a  test 
statistic.  Prsestgaard  (1995)  extends  Bickel’s  result  to  more  general  sample  spaces. 
Baringhaus  and  Franz  (2004)  propose  a  Cramer-von  Mises-like  statistic  on  R"'  by 
comparing  the  average  Euclidean  distance  between  points  in  different  groups  to  the 
average  distance  between  points  in  the  same  group.  Hall  and  Tajvidi  (2002)  propose  a 
pennutation  test  using  a  nearest-neighbors  approach  that  generalizes  both  the 
Kolmogorov-Smirnov  and  Cramer-von  Mises  tests  to  the  multivariate  case.  These  tests 
all  rely  on  simulation  to  compute  estimated  quantiles  for  the  test  statistic  null  distribution. 

Li  and  Liu  (2004)  apply  the  notion  of  data  depth  to  nonparametric  tests  for 
changes  in  multivariate  location  or  scale.  Data  depth  is  a  way  of  measuring  how  “deep” 
or  “central”  a  point  is  with  respect  to  a  particular  distribution  or  sample  (Liu  et  al.,  1999). 
Well-known  examples  of  such  measures  in  the  data  depth  literature  include  half-space 
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depth  (Hodges  (1955)  ,  Tukey  (1975),  sometimes  referred  to  as  Tukey  depth),  convex 
hull  peeling  depth  (Barnett,  1976),  and  simplicial  depth  (Liu,  1990).  Data  depths 
provide  a  natural  center-outward  ordering  of  points  in  a  multivariate  sample;  once 
ordered,  various  univariate  tests  for  change  may  be  applied  with  respect  to  the  ordering. 

Fricker  and  Chang  (2009)  propose  a  sequential  change-point  test  which 
uses  an  available  history  of  multivariate  observations  combined  with  the  k  most  recent 
observations  (where  k  is  an  adjustable  window  parameter)  to  generate  a  nonparametric 
running  estimate  of  the  underlying  density  distribution.  The  history  is  assumed  to  be  in 
control;  that  is,  the  historical  observations  are  all  drawn  from  the  null  distribution  with  no 
change  point.  With  each  new  observation  they  compute  a  new  density  estimate  and  then 
perform  a  Kolmogorov-Smirnov  test  to  identify  whether  the  density  heights  of  the  data  of 
interest  are  uniformly  distributed  among  the  density  heights  of  the  historical  data. 

b.  Graph-Theoretic  Approaches  and  Matching 

An  intriguing  approach  to  the  change-point  problem  involves  applying 
graph-theoretic  ideas.  In  particular,  methods  based  on  the  graph-theoretic  concept  of 
matching  have  gained  interest  in  recent  years,  in  no  small  part  due  to  increases  in 
computational  capacity.  The  test  statistics  we  propose  in  this  work  are  all  matching- 
based;  therefore,  we  conclude  this  chapter  by  providing  necessary  graph  theory 
definitions  and  background  to  develop  our  ideas  and  reviewing  graph-theoretic 
approaches  introduced  by  Friedman  and  Rafsky  (1979)  and  Rosenbaum  (2005)  which 
have  inspired  our  work. 

(1)  Definitions.  The  definitions  in  this  section  are  from  Chartrand 
and  Zhang  (2005).  A  graph  G  consists  of  a  finite  nonempty  set  V  of  elements  called 
vertices  and  a  set  E  of  two-element  (unordered)  subsets  of  V  called  edges,  in  which  case 
we  write  G  =  (V,E ).  A  graph  Gi=(Vx,E^  is  called  a  subgraph  of  G  =  (V,E )  if 

fj  <=  L  and  E1  c=  E ;  if  fj  =  V  then  G,  is  called  a  spanning  subgraph  of  G  .  We  denote 
the  edge  joining  vertices  u  and  v  by  {w,v}.  Two  distinct  vertices  are  adjacent  vertices 
if  they  are  joined  by  an  edge,  and  two  distinct  edges  are  adjacent  edges  if  they  share  a 
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vertex.  Vertex  u  and  edge  {«,  v}  are  said  to  be  incident  with  each  other,  and  the  degree 
of  vertex  u  is  the  number  of  edges  incident  with  u  . 


A  u  —  v  walk  in  graph  G  is  a  sequence  of  vertices  in  G  beginning 
with  u  and  ending  with  v  such  that  consecutive  vertices  in  the  sequence  are  adjacent;  if 
u  —  v  then  the  walk  is  closed.  A  u  —  v  trail  is  a  walk  in  which  no  edge  is  traversed  more 
than  once;  a  circuit  is  a  closed  trail  including  at  least  three  distinct  vertices.  A  circuit 
that  repeats  no  vertex  except  for  the  first  and  last  is  a  cycle.  If  there  exists  a  u  —  v  walk 
for  every  pair  of  vertices  u  and  v  in  graph  G  then  G  is  said  to  be  connected. 

A  graph  G  is  called  acyclic  if  it  has  no  cycles,  a  tree  is  an  acyclic 
connected  graph,  and  a  spanning  tree  of  G  is  a  spanning  subgraph  of  G  that  is  a  tree. 
If  a  real  number  is  assigned  to  each  edge  of  a  graph,  then  the  graph  is  a  weighted  graph 
and  the  sum  of  the  all  the  edge  weights  is  called  the  weight  of  the  graph.  A  spanning  tree 
of  weighted  graph  G  whose  weight  is  smallest  among  all  spanning  trees  of  G  is  called  a 
minimum  spanning  tree  (MST). 

Friedman  and  Rafsky  (1979)  consider  various  statistics  based  upon 
MSTs  in  order  to  test  whether  two  samples  are  drawn  from  the  same  distribution.  Given 
sets  of  observations  and  as  above,  they  construct  a  MST  with  respect  to  some  cost 

function  on  the  sample  space.  One  change-point  detection  method  they  consider  consists 
of  removing  each  edge  of  the  MST  that  connects  a  point  in  At  to  a  point  in  A2 ,  and  then 

defining  a  test  statistic  that  counts  the  number  of  disjoint  subtrees  that  result  from  the 
edge  removal.  The  resulting  test  is  effectively  a  multivariate  runs  test,  which  corresponds 
exactly  to  the  Wald-Wolfowitz  runs  test  for  the  univariate  case.  The  Wald-Wolfowitz 
runs  test  is  known  to  be  not  particularly  powerful  (Connover,  1999,  p.  3).  Friedman  and 
Rafsky  (1999)  demonstrate  that  their  multivariate  runs  test  has  high  power  in  higher 
dimensions,  and  they  enhance  test  power  by  computing  their  test  statistic  on  a  collection 
of  orthogonal  MSTs,  where  two  MSTs  are  orthogonal  if  they  have  no  edges  in  common. 
We  will  use  a  similar  idea  to  extend  our  main  results  for  improved  power. 
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We  require  a  few  final  definitions  to  develop  our  main  results.  A 
subset  of  edges  E'  c:  E  is  independent  if  no  two  edges  in  E'  are  adjacent.  A  matching 

in  a  graph  G  =  ( V,E )  is  an  independent  set  of  edges  in  G.  A  maximum  matching  in  G 

is  a  matching  that  consists  of  at  least  as  many  edges  as  any  other  possible  matching  in  G. 
For  the  remainder  of  this  paper,  all  matchings  under  consideration  are  maximum 
matchings  (that  is,  we  are  interested  only  in  matchings  that  pair  together  as  many  vertices 
as  possible).  Finally,  a  perfect  matching  in  G  is  a  matching  that  includes  every  vertex  of 
G.  A  perfect  matching  is  necessarily  a  maximum  matching;  furthermore,  a  perfect 
matching  is  possible  only  on  graphs  with  an  even  number  of  vertices. 

(2)  Bipartite  and  Non-Bipartite  Matching.  A  variety  of  problems 
can  be  framed  as  matching  problems,  where  a  matching  is  sought  that  minimizes  some 
cost  (Ahuja  et  al.,  1993,  p.  9).  Two  specific  cases  are  bipartite  matching  problems, 
where  graph  vertices  are  divided  into  two  distinct  subsets  A,  and  A2  and  each  edge 

consists  of  one  vertex  each  from  A,  and  A2 ,  and  non-bipartite  matching  problems, 

where  the  matching  does  not  depend  on  a  partition  of  the  vertices  (that  is,  any  two 
vertices  may  be  paired  in  the  matching). 

In  our  case,  we  are  interested  in  matchings  of  minimum  cost, 
where  a  cost  function  is  defined  as  follows:  Given  sample  space  S  ,  c:S  xS  — >[0,oo) 
is  a  cost  function  if  it  satisfies 

(2.10)  c(x,x)  =  0  VxgS 

and 

(2.11)  c(x,y)  =  c(y,x)  Vx,y  eS  , 

We  use  c.j  to  denote  the  cost  c  ( A( ,  X.  j  and  also  sometimes  use  the  common 
tenninology  that  c..  is  the  weight  of  the  edge  joining  X j  and  X. .  We  will  call  cost 
function  c  a  distance  function  if  it  satisfies  the  triangle  inequality 

(2.12)  c(x,z)<c(x,y)+c(y,z)  Wx,y,zeS 
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in  addition  to  (2.10)  and  (2.11)  .  In  general,  this  framework  allows  broad  flexibility  to 
accommodate  all  types  of  data  (discrete  or  continuous,  numeric  or  categorical,  etc.).  In 
the  change-point  setting,  a  cost  function  should  be  tailored  in  some  sensible  way  to  the 
data  types  of  interest  and  its  selection  ultimately  does  matter  in  detecting  departures  from 
the  null  hypothesis. 

We  formulate  the  minimum-weight  bipartite  matching  problem  in 
the  following  manner.  Given  sample  space  S  ,  two  distinct  sets  of  observations 
A,  ={I1,...,IB}CS  and  A2={lB+1,...,Jn+n}cS  ,  N  =  m  +  n  ,  and  cost  function 
c,  a  minimum-weight  bipartite  matching  is  a  solution  to  the  problem 

m  N 

X  X  cuxu 

i= 1  j=m+ 1 
N 

X  xij  -  1  Vz  e{l,...,m}, 

j=m+l 
m 

Xxo-'  V/e{w  +  l,...,JV}, 

1=1 

m  N 

X  X  xtJ=mm(m,n), 

i= 1  j=m+l 

Xtj  e  {0,1}  V/e{l,. \/j  + 

where  xij  =1  indicates  that  edge  j  X. ,  Xj  |  is  in  the  matching  and  xtj  =0  otherwise.  In 

the  graph  underlying  this  problem,  each  element  of  A,  is  joined  by  an  edge  to  each 

element  of  A2 ;  no  edges  join  elements  within  A,  or  within  A., .  The  solution  to  this 

problem  is  not  necessarily  unique.  In  operations  research  literature,  this  problem  is  a 
particular  instance  of  the  “general  assignment  problem”  (Ahuja  et  al.,  1993,  pp.  639- 
640).  Software  algorithms  to  solve  this  problem  include  that  of  Jonker  and  Volgenant 
(1987)  and  are  widely  available. 

Alternatively,  we  formulate  the  minimum-weight  non-bipartite 
matching  problem  as  follows.  Given  sample  space  S  ,  a  single  set  of  observations 
A  —  {A, , . . . ,  Av  }  C  S  ,  and  cost  function  c ,  a  minimum  weight  non-bipartite  matching 
is  a  solution  to  the  problem 


minimize 

X 

subject  to 

(2.13) 
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minimize 

X 

N- 1  N 

X  X  cuxu 

1=1  j=i+ 1 

(2.14) 

subject  to 

k- 1  N 

S*»+5>*y£1  Vis{l,...,iV-l}, 

1=1  j=k+ 1 

N- 1  N 

ZZ^=LJV/2J. 

i=l  j=i+ 1 

xij  e  {0,1}  Vye  {/  +  !,...,  A},  V/ e  {l,...,JV-l} 

where  [  vj  is  the  smallest  integer  less  than  or  equal  to  y  and  x. .  indicates  whether 

|  Xj ,  X j  | ,  i  <  j ,  is  in  the  matching  as  in  the  bipartite  matching  case.  In  this  case,  every 

pair  of  elements  in  A  is  joined  by  an  edge  and  the  underlying  graph  is  referred  to  as  a 
complete  graph.  The  solution  to  this  problem  is  a  matching  that  consists  of  N  /  2  edges 
with  every  vertex  included  in  the  matching  if  N  is  even,  or  ( /V  -  I )  /  2  edges  with  every 

vertex  but  one  included  in  the  matching  if  N  is  odd.  As  in  the  bipartite  matching  case, 
the  solution  to  the  non-bipartite  matching  problem  is  not  necessarily  unique.  Algorithms 
by  Edmonds  (1965)  and  Derigs  (1988)  solve  the  minimum- weight  non-bipartite  matching 

problem  on  a  complete  graph  in  0(A3)  time.  Several  improvements  to  Edmond’s 

algorithm  have  been  developed  over  the  years  (Gabow,  1973;  Galil  et  al.,  1986;  Gabow 
et  al,  1989;  Cook  and  Rohe,  1999;  Mehlhom  and  Schafer,  2002;  Kolmogorov,  2009). 
While  these  improvements  do  not  improve  theoretical  runtime  on  a  complete  graph,  they 
have  been  shown  to  improve  realized  runtimes  in  many  practical  instances.  Edmond’s 
original  algorithm  to  solve  the  non-bipartite  weighted  matching  problem  is  sometimes 
called  Edmond’s  “blossom”  algorithm,  where  the  term  “blossom”  refers  to  subgraphs 
with  particular  properties  that  emerge  during  execution  of  the  algorithm.  Subsequent 
improvements  often  carry  the  “blossom”  moniker;  the  latest  is  Kolmogorov’s  “Blossom 
V”  (2009).  In  Chapter  IV,  we  elaborate  on  computational  details  specific  to  our 
implementation  of  non-bipartite  matching  algorithms  for  this  study. 

(3)  Cost  Functions.  In  both  the  bipartite  and  non-bipartite  cases, 
the  assignment  of  pairs  in  a  matching  depends  upon  the  choice  of  a  cost  function.  For  the 
problem  of  testing  for  data  homogeneity  with  respect  to  some  ordering,  we  will  use  cost 
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functions  that  are  reasonable  dissimilarity  measures.  Some  applications  will  invite  a 
natural  choice  of  dissimilarity  measure;  for  other  applications  this  choice  may  require 
some  deliberation.  In  our  simulation  study  we  consider  four  different  distance  functions 
on  the  sample  space  S  =  .  The  first  is  Euclidean  distance  (ED),  which  yields 


(2.15) 


ED 

ca  = 


(x.-xjfr-*,) 


1/2 


One  disadvantage  of  ED  is  that  it  does  not  take  into  account 
measurement  scale  or  correlation  among  data  components.  To  address  these  issues, 
Mahalanobis  distance  (MD)  is  often  used  as  an  alternative,  which  is  scale-invariant  and 
accounts  for  data  correlation.  The  resulting  cost  function  is 


(2.16) 


MD 

cu  = 


where  V  is  an  estimate  of  the  covariance  matrix  associated  with  the  data  of  interest. 
Estimating  V  by  the  sample  covariance  matrix  is  sensitive  to  outliers,  however,  so  we  are 
also  interested  in  distance  measures  that  are  robust  to  such  outliers.  Wang  and  Raferty 
(2002)  provide  a  useful  method  to  downweight  outliers,  called  nearest-neighbor  variance 
estimation  (NNVE)  that  measures  how  outlying  a  data  point  is  by  the  standardized 
distance  between  the  point  and  its  kth  nearest  neighbor  (where  k  is  an  adjustable 
parameter).  Therefore,  we  consider  a  third  distance  function,  which  we  will  refer  to  as 
“Mahalanobis  distance,  robust”  (MD-R),  which  is  simply  the  MD  computed  with 
reference  to  the  NNVE  covariance  estimate  fj^yp  (we  omit  any  k  subscript  to  simplify 


notation  here): 
(2.17) 


MD 

%  = 


1 1/2 


Another  way  to  reduce  sensitivity  to  outliers  is  to  consider  a 
distance  measure  based  on  the  idea  of  multivariate  ranks.  We  adopt  a  useful  extension  of 
rank  to  higher  dimension  given  by  Barnett  (1976)  and  Chaudhuri  (1996),  described  as 
follows  by  Choi  and  Marden  (1997).  Consider  a  d  -dimensional  inner  product  space  S 
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and  observations  Xx  ,...,XN  G  S  .  For  d  —  1 ,  denote  the  rank  of  observation  X;  by  r[,) , 
assigning  midranks  in  the  case  of  ties.  Center  and  scale  these  ranks  by  the  transformation 

2  r(,)  —  n  —  1 


(2.18) 


R(x,)- 


which  maps  , . . . , r(A  J  |  into  the  open  interval  (—  1,  l) .  Then  (2.18)  can  be  written 


(2.19) 


where  sgn  is  the  signum  function 


(2.20) 


sgn(x) 


0 

x 

be  I 


if  x  =  0 ; 
otherwise. 


Now  for  X.  ^  X j ,  the  summand  in  (2.19)  can  be  expressed  as 


(2.21) 


sgnfz,-^)-  X‘  X‘ 


\x,-x. 


Then  a  very  natural  extension  of  this  centered  and  scaled  ranking  transformation  to  the 
case  d  >  1  is  obtained  by  defining 


(2.22)  K(X,)=  '  Y\  X‘  X' , . 

where  II  •  II  may  be  any  norm  in  general;  unless  otherwise  specified  we  will  use 


X  =  yjx'X  VX  e  S.  We  will  use  the  term  “multivariate  rank  distance”  (RD)  to  refer 


to  the  Euclidean  distance  between  multivariate  ranks  as  our  fourth  and  final  distance 
option: 


(2.23) 


1/2 


In  Chapter  IV,  we  examine  the  impact  these  different  distance 
functions  have  on  our  ability  to  detect  change  with  the  new  change-point  detection 
methods  presented  in  the  next  chapter. 
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(4)  Matching  Examples.  We  now  illustrate  these  matching 
concepts  with  a  few  simple  examples.  Figures  1,  2,  and  3  show  the  same  20 
observations,  each  drawn  from  a  bivariate  normal  distribution,  with  Euclidean  distance  as 
the  cost  function.  The  coordinates  for  these  data  are  listed  in  Appendix  D.  In  Figure  1, 
the  observations  are  randomly  partitioned  into  two  subsets  containing  8  and  12 
observations  indicated  by  group  labels  ‘  □  ’  and  ‘  *  ’,  respectively;  the  resulting  minimum- 
weight  bipartite  matching  consists  of  8  pairs,  with  4  observations  from  the  larger  set 
remaining  unmatched.  A  quite  different  matching  results  when  the  observations  are 
partitioned  in  a  different  way,  as  shown  in  Figure  2  with  two  subsets  of  equal  size. 
Finally,  Figure  3  demonstrates  the  minimum-weight  non-bipartite  matching  on  these 
same  20  points  with  no  prior  partitioning.  These  examples  reiterate  two  fundamental 
differences  between  bipartite  and  non-bipartite  matchings.  First,  the  non-bipartite 
matching  is  not  associated  with  any  prior  partitioning  of  the  observation  set,  while  the 
bipartite  matching  depends  greatly  on  how  the  observation  set  is  partitioned.  Indeed,  for 
an  even  number  of  observations  one  may  think  of  the  minimum-weight  non-bipartite 
matching  as  the  lowest  cost  matching  among  all  minimum-weight  bipartite  matchings 
computed  for  all  possible  partitions  of  equal  size.  Second,  in  the  case  of  non-bipartite 
matching  for  an  even  number  of  observations,  all  observations  are  paired  with  another 
(all  but  one  are  paired  if  the  number  of  observations  is  odd),  while  in  the  bipartite 
matching  case  some  observations  are  necessarily  left  unmatched  in  the  larger  of  the 
partition  sets. 
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Figure  1.  Minimum-weight  bipartite  matching  on  20  points;  m= 8,  n=12. 


Figure  2.  Minimum-weight  bipartite  matching  on  20  points;  »i=10,  n=10. 
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Figure  3.  Minimum-weight  non-bipartite  matching  on  20  points. 

(5)  The  Cross-Match  Statistic.  With  these  ideas  in  place,  we 
conclude  this  chapter  by  reviewing  a  recent  non-bipartite  matching-based  approach  to 
change-point  detection  by  Rosenbaum  (2005).  Rosenbaum  presents  an  exact 
distribution-free  test  to  detect  change  in  multivariate  data  using  non-bipartite  matching. 
We  explain  his  test  in  a  bit  more  detail  than  the  previous  tests,  as  his  ideas  have 
substantially  motivated  our  thinking  on  this  problem.  In  fact,  one  of  our  results  extends 
his  single  non-sequential  test  to  a  simultaneous  test. 

Consider  the  case  where  A,  =  , . . . ,  Xm  }  is  the  set  of  the  first  in 
observations  and  A2  =  {Xm+x , . . . ,  Xm+n }  is  the  set  of  the  last  n  =  N  —  m  observations. 
The  goal  is  to  test  for  equality  of  distributions  for  the  populations  associated  with  A,  and 
A, .  Compute  an  optimal  non-bipartite  matching  on  A,  U  A2  and  let  Mc  denote  the 
number  of  pairs  cross-matched  between  A,  and  A2 .  Rosenbaum  (2005)  calls  Mc 
(which  he  denotes  “  Al  ”)  the  cross-match  statistic  and  derives  its  exact  null  distribution. 
For  the  case  where  N  is  even,  this  distribution  is  given  by 
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(2.24) 


P\MC  =k)  = 


2k  (nh)\ 


m  —  k 
2 


\k\ 


n  —  k 
2 


where  k  takes  even  values  from  0  to  min  {m,n)  if  m  and  n  are  both  even,  k  takes  odd 

values  from  1  to  min {m,n)  if  m  and  n  are  both  odd,  and  p{m(  =  k)  =  0  for  all  other  k. 

The  case  where  N  is  odd  may  be  accommodated  by  introducing  a  pseudo-observation 
XN+l  such  that  c ( X , X N_  , )  =  0  Vi,  computing  a  non-bipartite  matching  on  the  pooled 

sample  {X1 XN+1 } ,  and  discarding  the  pair  that  includes  the  pseudo-observation. 
Equation  (2.24)  is  then  adjusted  to  refer  to  the  remaining  N  —  1  observations  and 
[N  —  l)/  2  pairs,  conditioned  on  the  set  membership  of  the  observation  with  which  the 
pseudo-observation  is  paired.  The  distribution  for  odd  N  becomes 

| P[Ml  =  k| XN+l  is  paired  with  an  element  of  A,  ’ 
x  P  \  x N_ ,  is  paired  with  an  element  of  A,  j 
I  p(mc  =  k\ XN+x  is  paired  with  an  element  of  A, 


P  Mc  =  k)  = 


+ 


1J  ^»U11VU  VV1U1  UI1  VIVlllVlll  W1  I  \>£ 
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where  /  ( • )  is  the  indicator  function  and  the  factorial  tenns  that  depend  on  m  or  n  are 
only  computed  when  the  factorial  argument  is  an  integer  (since  otherwise  the  indicator 
function  in  the  numerator  is  zero).  In  this  case,  k  takes  values  from  0  to  min (m,n) .  In 
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any  case,  the  null  hypothesis  that  the  distributions  underlying  A,  and  A2  are  equal  is 

rejected  for  small  values  of  Mc ,  since  different  underlying  distributions  lead  to  a 
preference  for  within-group  matching  over  cross-group  matching. 

While  (2.24)  and  (2.25)  are  exact  probabilities,  in  practice  these 
values  can  be  difficult  to  compute  for  large  N.  Rosenbaum  proves  that  under  the  null 
hypothesis,  the  conditional  distribution  of  Mc  given  m  converges  in  distribution  to  the 
nonnal  distribution  for  m  I  N  —>  p  (constant) ;  that  is, 

(2.26)  - ^ - >  N  (0,1), 


where 

(2.27) 


EM 


mn 


N- 1 


and  a. 


Var[Mc' 


2m(m  —  Y)n(n  —  1) 

(JV-3)(JV-1)= 


for  even  N.  This  enables  application  of  the  cross-match  test  to  cases  where  N  is  large. 

We  provide  a  small  example  to  illustrate  this  test:  Figure  4  shows 
the  familiar  data  cloud  from  previous  figures;  in  this  case  the  data  are  associated  with  two 
subgroups  of  equal  size.  Group  labels  ‘  □  ’  and  ‘  *  ’  are  assigned  at  random  to  model  two 
groups  of  8  and  12  points,  respectively,  that  are  drawn  from  a  common  distribution. 
Cross-matches  are  circled;  for  this  case  Rosenbaum’s  cross-match  statistic  takes  the  value 
Mc  =  4  and  has  an  associated  /?-value  of  p[mc  <  4)  =  0.48.  Clearly  no  significant 
evidence  exists  to  infer  that  the  distributions  underlying  the  two  groups  are  different. 


24 


Figure  4.  Rosenbaum’s  cross-match  statistic  with  no  change  point. 


Figure  5.  Rosenbaum’s  cross-match  statistic  with  change  point. 
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On  the  other  hand.  Figure  5  shows  the  same  scatter  plot,  where  this 
time  one  group  consists  of  the  lower-left  hand  observations  (‘  *  ’)  and  the  other  group 
consists  of  the  upper-right  hand  observations  (‘  □  ’)  .  Again,  cross-matches  are  circled, 
and  in  this  case  Mc  =  2  with  an  associated  p-\ alue  of  P(^MC  <  2)  =  0.08  .  This  /7-value 
is  certainly  stronger  evidence  that  the  two  groups  have  different  underlying  distributions; 
for  this  small  sample  size  the  smallest  achievable  /7-value  is  p[mc  =  0)  =  0.0014 . 

Other  examples  of  optimal  non-bipartite  matching  techniques 
applied  to  statistical  problems  are  found  in  Lu  et  al.  (2001),  Lu  and  Rosenbaum  (2004), 
and  Greevy  et  al.  (2004).  In  an  observational  study  of  a  media  campaign  against  drug 
abuse,  Lu  et  al.  (2001)  use  optimal  non-bipartite  matching  to  pair  teen  subjects  in  such  a 
way  that  each  pair  is  demographically  similar  but  has  markedly  different  exposure  to  the 
media  campaign.  The  evaluation  compares  stated  intentions  related  to  illegal  drugs 
among  comparable  teens  to  assess  the  effectiveness  of  the  campaign.  Lu  and  Rosenbaum 
(2004)  transform  a  tripartite  problem  of  comparing  one  test  group  to  two  control  groups 
into  a  non-bipartite  matching  problem  in  order  to  evaluate  whether  or  not  a  localized 
minimum  wage  increase  could  be  associated  with  depressed  low-wage  employment  in 
that  area.  Greevy  et  al.  (2004)  demonstrate  that  optimal  non-bipartite  matching  leads  to 
improved  covariate  balance  over  randomized  block  design  in  a  study  of  a  cardiac 
function  treatment  for  child  cancer  survivors. 

The  tests  we  propose  belong  to  the  small  but  growing  category  of 
graph-theoretic  tests  for  homogeneity  that  involve  minimizing  sums  of  interpoint  costs  on 
graphs;  this  category  includes  Friedman  and  Rafsky’s  (1979)  MST  test  and  Rosenbaum’s 
(2005)  cross-match  test.  The  null  hypothesis  of  homogeneity  implies  that  the  structure 
of  these  graphs  is  indifferent  to  group  labeling,  which  permits  the  derivation  of  null 
distributions  by  straightforward  arguments.  We  will  apply  this  principle  to  derive  null 
distribution  results  for  our  tests. 
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In  summary,  much  room  remains  for  innovative  work  in  the  field 
of  multivariate,  nonparametric  change-point  detection.  Particularly,  very  few  change- 
point  tests  exist  with  the  properties  of  being  multivariate,  simultaneous,  and  distribution- 
free.  We  proceed  now  to  present  tests  with  exactly  these  properties. 
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III.  THEORETICAL  RESULTS 


Suppose  that  we  have  an  even  number  N  =  2n>4  observations  [A, , . . . ,  X N } 

ordered  with  respect  to  time  (or  any  other  variable)  and  we  want  to  test  whether  the 
observations  are  changing  with  respect  to  this  ordering.  For  example,  we  might  want  to 
test  for  a  jump  or  a  drift  beginning  at  some  unknown  point  in  the  sequence.  The 
observations  may  be  multivariate,  but  are  assumed  to  be  independent.  The  requirement 
that  N  be  even  is  not  strictly  necessary,  but  it  does  simplify  the  exposition  (we  explain 
later  how  our  results  extend  to  odd  sample  sizes)  . 

For  a  non-sequential  simultaneous  test  where  F.  denotes  the  distribution  of  the 

Ith  sample  value,  the  null  hypothesis  of  homogeneity  asserts  that  Fl  =  F2  =•••  =  FN 

without  specifying  the  common  distribution.  The  alternative  hypothesis  asserts  that  there 
exists  an  integer  r,  1<t0<t<t1<N  —  1,  such  that  Fx  =  F2  =  •  ■  •  =  Fr_x ,  Ft_x  ^  Ft  , 

and  b  ( Fl ,  F.  j  -  max/(6,  ^  6  [Fk ,  Fj  j  is  strictly  positive  over  j  e  {r, . . . ,  N} ,  usually  with 

r0  =  1  and  Tj  =  N  —  1  as  discussed  previously.  With  respect  to  a  given  cost  function  we 

compute  an  optimal  non-bipartite  matching  M  =  {{xh,Xh},...\xjM,Xh}).  Let 

Rl,R2,...,R2n  denote  the  sequence  labels  associated  with  each  observation  such  that  if 

| Xj^  | ,  Xj^  |  is  the  Ith  listed  pair,  then  R2i  ,  =  j2i  ]  and  R2i  =  j2i.  Finally,  order  each 

individual  pair  as  (Ui,  T  j ,  where  Ui  and  Yj  are  the  minimum  and  maximum  ranks  of  the 
ordering  variable  respectively: 

(3.1)  Uj  =min{/?2M,/?2j.}  and  Yt  =  max{f?2M,/?2(.},  ie{l,. 

With  this  setup  in  place,  we  are  now  ready  to  propose  new  change-point  tests  based  on 
the  ordered  rank  pairs  associated  with  an  optimal  non-bipartite  matching. 
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A. 


THE  SUM  OF  PAIR-MAXIMA  TEST 


1.  The  Tn  Statistic 

Under  the  alternative  hypothesis  that  some  distribution  change  has  occurred 
within  a  sequence  of  observations,  one  expects  an  optimal  non-bipartite  matching  to  pair 
observations  that  are  closer  together  in  sequence  than  would  be  the  case  under  the  null 
hypothesis.  This  suggests  summing  the  differences  Y.  —  Ui  across  all  pairs  and  rejecting 
the  null  hypothesis  if  the  sum  is  less  than  some  critical  value.  We  consider  an  equivalent 
test  statistic  TN  based  on  its  relationship  to  this  sum: 

(3.2)  r»=ir=U(2«+i)+izh-v). 

i=l  ^  2*  i= 1 

We  call  Tn  the  Sum  of  Pair-Maxima  (SPM)  test  statistic,  with  rejection  of  the  null 
hypothesis  indicated  by  small  values  of  this  sum.  We  proceed  to  derive  the  mean  and 
variance  of  TN ,  and  show  that  TN  has  a  limiting  normal  distribution  by  invoking  a  central 
limit  theorem  result  attributed  to  Stein  (1986). 

2.  Expected  Value  and  Variance 

A  sequence  (Z15...,Zt)  of  random  variables  is  said  to  be  exchangeable  if  for  any 
permutation  n  of  indices  {l,...,k},  the  joint  probability  distributions  of  (Z, , . . . ,  Zk  j  and 

[z^,...,Z^k^j  are  identical  (Fristedt  and  Gray,  2004).  Under  the  null  hypothesis,  each 

of  the  N\  possible  assignments  of  sequence  labels  is  equally  likely  and  the  random 
variables  (Yx,...,Yn)  are  exchangeable.  To  obtain  expressions  for  the  expected  value  and 
variance  of  TN  we  apply  equations  (3.3)-(3.5),  which  are  derived  in  Appendix  A: 

(3.3)  P{Yl=t)=  *~l  t  =  2,...,2n, 

n[2n-l) 
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P{Y2=t\Yl=s)  = 


(MG9"3) 

(s-l)(«-l)(2/?-3) 
t- 3 

(n-l)(2«-3) 


,  t  =  2,...,s-l. 


t  =  s  + 1, . . . ,  2n  , 


£[!?]  = 
E[Y2Yx}  = 
Var(J^)  = 


2(2/i  + 1) 

3 

(2« +  1)  (3o  +  l) 

3  ’ 

8(2n  +  l)(5«  +  2) 
45  : 

(2« +  l)(«-l) 


Cov(y[,72)  =  - 


4(2/7  + 1) 


The  following  are  now  immediate: 


^=g[rvH>gM=2"(23"+1>. 

=  Varfl^]  =  «Var[7,]  +  ft(«-l)Cov[71,72] 
«(n-l)(2n  +  l) 


3.  Using  Stein’s  Method  to  Establish  Asymptotic  Normality 

One  important  result  of  our  research  is  the  establishment  of  a  central  limit 
theorem  for  TN ,  namely  that 


P\Tn  Mn  <t  j-HP(f)  as  N 


for  every  -qo  <t  <  oo ,  where  4?  is  the  standard  normal  cumulative  distribution  function. 

n 

While  the  Y.' s  in  the  sum  TN  =  ^  7  do  have  identical  marginal  distributions,  they  are 
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not  independent  so  the  classical  central  limit  theorem  is  not  applicable  here.  Instead,  we 
prove  (3.7)  by  a  technique  referred  to  as  Stein’s  method.  Stein’s  method  (Stein,  1972, 
1986)  is  based  on  a  simple  differential  equation  that  characterizes  the  normal  distribution, 
and  an  idea  called  “coupling,”  which  involves  the  construction  of  auxiliary  random 
variables  that  are  “close”  to  the  variables  under  investigation.  This  method  establishes 
bounds  on  the  distance  from  normality  for  certain  cases  of  dependence,  including  our 
case. 


We  exploit  the  combinatorial  structure  of  TN  to  construct  an  exchangeable 
coupling  that  allows  us  to  invoke  Stein’s  results.  Let  WN  =  ( TN  -  juN )  /  crN .  On  the  same 
probability  space  on  which  TN  is  defined  we  construct  a  random  variable  “close”  to  TN , 
which  we  denote  TN ,  by  selecting  distinct  integers  u  and  v ,  u  <  v ,  at  random  from 
1,2,..., «  (with  ft  >3),  switching  R2u  and  R2v,  and  taking  the  sum  of  the  modified  pair 
maxima.  For  (^TN,fN^j  to  be  an  exchangeable  pair  simply  means  that 
p[Tn  =t,fn  =  =  p{t„  =  t,Tn  =  for  all  t  and  t  .  This  symmetry  may  be  shown  as 

follows.  Let 

0.8)  r.M=Er+Er+Er 

1=1  i—u+l  *=v+l 

be  the  sum  of  pair-maxima  for  all  pairs  in  a  matching  except  for  the  wth  and  vth  pair, 
where  Yt  denotes  the  pair-maxima  of  the  /'lh  pair  as  before.  Therefore, 

TN=To{U’V)  +  maX  (^2„-l »  R2u  )  +  maX  (  R2v-\  ,  R2v  )  . 

TN  =  T0  (ft,  v)  +  max  (  R2ii  , , Rlv )  +  max (R2v_i ,  Rlu ) , 


for  any  choice  of  u  and  v .  Conditioning  on  T0  (w,v)  gives 
(3.10) 


P(Tn  =tX  =0  =  E  E  P[Tn  =t’R«  =t\T0(li,v)  =  t0(u,v))jP(T0(u,v)  =  t0(u,v)) 

(u.v)  t0(u,v) 
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Now  let  <7j  (w,v)  <  a2  (u,v)  <  a2  (m,v)  <  o4  (w,v)  be  the  ordered  values  of  >  ^2„. 
f?2v_j ,  and  i?2v.  Then,  conditional  on  T0  (w,v) ,  the  only  values  Tv  and  TN  can  assume  are 
bx  (m,v)  =  T0  (m,v)  +  a2  (m,v)  +  a4  (w,v)  and  b2  (w,v)  =  T0  (w,v)  +  a3  (w,v)  +  o4  (w,v) .  It 
follows  directly  that 

=^(m,  v),Tn  =^(M,v)|r0  (u,v))  =  0; 

p(tn  =b2(u,v),TN=bl(u,v)\T0(u,vfj  =  ^; 

(3.H)  _  , 

p[tn  =bl(u,v),TN  =b2(u,v)\T0(u,v)')  =  ~; 

P(TN=b2{u,v),fN=b2(u,v)\T0(u,vj}  =  ^; 

so  the  joint  conditional  probability  distribution  p(tn,Tn\to  (n,v)j  is  symmetric. 
Therefore,  by  (3.10),  p[Tn  =  t,fn  =  t)  =  P(Tn  —  t Jn  =  t)  and  so  ( TN,fN )  is  an 
exchangeable  pair.  Now  define  W N  =  ( fN  -  //v  )  /  crv .  In  the  same  manner  (  IV N ,  if  v )  is 
an  exchangeable  pair. 

Moreover,  TN=TN+AN.u^,  where  AN;KtV=Yu+Yv-Ya-Yv  with 

Yu  =  max (Rlu_x, R2v)  and  Yv  =  max(R2u,R2v_1) .  Then  for  any  ie{l,...,n}, 

E[Y,\WN]  =  n-1TN  and 


(3.12) 


where 


(3.13) 


qn 

2n(n-\) 


(2«+i)(2«-i)  r, 

3(«-l)  2«(«-l) 


1  1  n  j2~l 


..  2 n  j— 1 

&v=ZZZZmaX  {R2j,-k >  R2j2-i  )  =  X  X  maX  (b  2  )  - 

A=0  ^=0  y2=2  yj=l  7=2  1= 1 

2n 

=  £7%/- i)-r„ 


y=2 


2n(2n  +  l)(2n  -l) 
3 


-T 
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Combining  these  expressions  gives 


(3-14)  E[WN\WN]  =  (1-AN)WN, 

where 


(3.15) 


K  = 


2  n  - 1 


(n-i)' 

From  Theorem  2.5  of  Rinott  and  Rotar  (2000)  we  have 


(3.16) 


A 


P{WN<t)-®{t)<—  Var \E  {WN-WNf\Wr 


W  -W 

rr  N  rr N 


which  in  the  present  case  reduces  to 

|P(»;  <  f)  - ® (»)|  <  -y^Var \e [a=„ ~  |  W„ ]} 


(3.17) 


6-45 


3/4 


7/4 


E  A, 


We  now  show  that  n  4Var|fi  [a^.h  v  |  WN~^  — >  0  and  n  7/2£’||AjV.i(v|3|  — >0  from  which 

| P{WN  <t)-0(f)|  — >0  will  follow.  Because  the  second  condition  easily  follows  from 
the  fact  that  |  AN.u  v  \<2n-3,  we  focus  on  the  first  condition.  We  cite  two  results  that 
will  be  useful: 

Lemma  3-1:  Suppose  that  3,  and  3,  are  a  -  fields  with  3j  c  32 .  Then  for  any 
random  variable  U  that  is  measurable  with  respect  to  both  3,  and  3, 


(3.18)  Var  [U  |  3j}  <  Var  [E  [U  |  32  ]}  . 

Proof  of  Lemma  3-1:  See  Theorem  34.4  of  Billingsley  (1986),  p.  470.  Let 
V  =  E[U  |32],  giving  E[V  \  3j  =  E[U  \  3j  and  Var {e[U  \  32]}  = 

Var(F)  =  Var{C [L  |  3j}  +  E\Var[V  \  3j}  >  Var{E[V  \  3t]}  =  Var{E[U  \  3j}  .  ■ 
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Lemma  3-2:  Let  m  and  N  be  positive  integers  with  2m<N ,  and  let  ( tt, , 7r2 ) 
denote  the  first  and  last  in  elements  of  a  random  pennutation  of  size  2m  taken  from  the 
integers  {1, 2 Then  for  any  real-valued  function g  satisfying  E [g2 )]  <  oo  , 

(3.19)  Cov[g(^),g(^2)]  = -Cov,[g(;rJ,g(;r2)](^m-l), 

where  CoVj  [g(^1),g(^2)J  refers  to  the  covariance  taken  over  randomly  selected  pairs 

(N  —  m)\(N  —  m)\ 

of  /« -permutations  with  at  least  one  common  element,  and  pNm  =- - — — - is 

N\\N  —  2m)\ 

the  probability  that  two  randomly  selected  pairs  of  m-permutations  have  no  element  in 
common. 

Proof  of  Lemma  3-2:  Let  E0  [g(^1)g(^2)J  =  pg  denote  the  expected  value  of 
the  product  g(^1)g(^2)  where  the  permutations  nx  and  n1  are  chosen  independently 
from  { 1, 2 and  E [g {nx )]  =  pg  .  Then 

(3.20)  E0  [g ) g {n2 )]  =  pNmE [g {nx  )g(zr2)]  +  (l- pNm ) Ex  [g (nx ) g (n2  )]  , 


where  fij  (•)  refers  to  expectation  taken  over  pairs  of  permutations  that  have  at  least  one 
element  in  common,  so 

E[_g{^x)g{^2)\  =  PN,m)Ex[_gM§{^l)^ 

=  PN\m  (Pg  P N .m  )  E\  [g  (^1  )  g  (^2  )])• • 


(3.21) 
Therefore 


(3.22) 


Cov[g(^),g(^2)]  =£[g(^)g(^2)]-//e2 

=  P~N,m  P N,m  )  El  [§  (*i  )#  (*2  )])  ' 

=  (  E<  [g  (*l  )  g  (*2  )]  -  Pg  )  ( P~N,m  ~  l) 

=  -CoVj  [g(^j),g(^2)]  [pN\m  -l) 


Pg 


as  asserted. 
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By  Lemma  3-1  it  is  sufficient  to  show  that  n  4Var|fi[A^.u  v  |  — »  0  for  = 

a (Rv..., Rn),  where 


(3.23) 


e[aI,,J^n]  =  2 


n  r- 1 

r= 2  s=\ 


i(n  —  l) 


Taking  the  variance  yields 


Var{£[A^;iJ3„]}  =  2 


(3.24) 


Var  J^n-i,2_  ,/  Cov  ( 2 ,  A^.j^ ) 

n(n- 1)  n(n- 1) 

Cov  (a^,  ,  An.3A  ) 


+  (n-2)(n-3)- 


?  ( /7  —  1 ) 


Now  use  the  fact  that  |  An.uv  <  2n  -  3  to  show  that  the  first  two  terms  on  the  right  in 


(3.24)  go  to  zero  when  multiplied  by 


_-4 


Cov[A»aJ.A^4]<(2«-3)'(ft‘4-l) 


where 


By  Lemma  3-2, 


(3.25) 


(N-4)(N-5)(N-6)(N-7) 
Pna  N(N-\)(N-2)(N-3) 

8(2N-1)(N2 -IN +  15) 
N(N-l)(N-2)(N-3) 


It  follows  that  P(WN  <  t)-O(t)  — »  0  as  claimed.  ■ 


4.  An  Improvement  to  the  Normal  Approximation  by  Edgeworth 
Expansion 

For  small  or  moderate  n,  the  error  associated  with  the  normal  approximation  may 
be  unsatisfactorily  large.  This  error  can  be  reduced  slightly  by  an  Edgeworth  expansion, 
which  approximates  the  distribution  of  interest  using  the  nonnal  distribution  plus  higher 
order  corrections  that  adjust  for  non-zero  moments  of  third  order  and  above.  We  derive 
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an  Edgeworth  expansion  to  include  the  skewness  of  TN ,  tc3  =  E  ( TN  -  jlin  )3 
Using  conditioning  arguments  as  before,  we  have 

(2/7 +  l)(24/72  +  15/7  +  l) 


as  follows: 


15 


(3.26) 


£rj,ly2].4(2'!  +  1K15',!  +  10',  +  1) 


45 


E[YJ2Y3]  = 


16(2/7  + 1)(70//2  +  49/7  +  6) 


945 


See  Appendix  A  for  details.  Now  write 
(3.27) 


1=1  j  j*k 


and  apply  exchangeability  to  obtain 

E [7£ ]  =  nE [if ]  +  3/7 (n  - 1) E \_YX% ~\+n(n-l)(n-2)E[YlY2Yi\ 


(2/7  +  l)(24n2  +\5n  + 1)  4(2/7  +  l)(l5«2  +10/7  + 1) 

-  11 - 2 - -  +  3/7  in !  —  1 ) - 2 - - 

15  v  ’  45 


=  n 


(3.28) 


+  /7(/7 —  1 )  ( 77  -2) 


16(2/7 +  l)(70/72  +49/7  +  6) 


945 


Therefore, 


(3.29) 


=  ( 1 1 20/74  + 1 204//3  +  236//2  -  43/7  +  3) . 


(4-Aw)3  =  E [^jv ]  —  3 -  Ajv 

—77  (  77  -  l)  (2/7  +  l)  (2/7  +  3) 


945 


=  *V 


Note  that  /r3  <  0  V/7  >  1 ,  so  7f  is  negatively  skewed  for  all  cases  (since  by  assumption 
we  consider  cases  with  at  least  two  pairs). 

Now  let  Fn  be  the  distribution  function  for  the  standardized  random  variable 
ZN  =  (Tn  —  juN  )  /  (JN ,  and  let  Xr  =  E  [zf  J  denote  the  /•"'  cumulant  of  ZN  .  In  particular, 
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(3.30) 


=  =  7^(2',+3)  , 

<Tv  Hyjnjn  -1)(2«  +  1) 


Then  the  Edgeworth  expansion  for  FN  with  respect  to  the  standard  normal  distribution 
function  ®  may  be  written 


(3.31) 


Fn  (*)  =  ®  W  -  +0{nl ) 

=  q(x)  +  ^L  (2n  +  3)  (xi-lje^+oU-1) 
126  yflft  /7^/(/7  —  1)  (2/7  + 


(Wallace,  1958).  It  is  evident  that  the  Edgeworth  expansion  (3.31)  makes  a  positive 
correction  to  the  normal  distribution  outside  one  standard  deviation  from  the  mean,  and  a 
negative  correction  inside  one  standard  deviation.  Table  1  shows  the  critical  values  for 
N  =  200  obtained  by  Edgeworth  expansion  compared  to  estimates  obtained  by 
simulation  and  normal  approximation. 


a 

Simulation 

Edgeworth 

Normal 

0.001 

12746 

12749 

12750 

0.005 

12854 

12857 

12858 

0.01 

12908 

12910 

12911 

0.05 

13050 

13054 

13054 

0.1 

13128 

13130 

13131 

Table  1.  Critical  values  of  the  TN  statistic  for  iV=200  estimated  by  10,000 
simulations,  Edgeworth  expansion,  and  normal  approximation. 
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The  improvement  appears  rather  small,  but  taking  the  skewness  of  TN  into  account  in  an 

Edgeworth  expansion  provides  improvement  nonetheless  and  reduces  SPM  test  false 
alarm  rates  slightly  relative  to  the  normal  approximation.  Appendix  B  lists  approximate 
critical  values  for  various  significance  levels  and  sample  sizes  using  (3.31). 

5.  Treatment  of  Odd  Sample  Sizes 

If  the  sample  size  is  odd,  non-bipartite  matching  will  leave  one  data  point 
unassigned.  Conceptually,  it  poses  no  difficulty  to  extend  our  results  to  this  case  as  we 
now  proceed  to  do.  Let  N  denote  the  total  sample  size  as  before.  For  clarity  we  let  Tm 

and  Tm  denote  the  sum  of  n  matched-pair  maxima  in  the  even  ( N  =  2n )  and  odd 
( N  =  2n  +  l)  cases  respectively.  Define  a  stochastic  replica  of  Tm  as  follows: 

1)  Select  integer  u  at  random  from  the  set  {1, 2, ... ,  2 n  + 1} ; 

2)  If  u<  2 n  take  fm  =  Tm  +  2n  +  \-  7L(M+1)/2j ; 

3)  Otherwise  take  fm  =  TN0 . 

Equivalently,  fvi  =  Evo  +  8n  [in  + 1  -  lj(„+l)/2j  j  where  Sn  is  an  independent  Bernoulli 

random  variable  with  success  probability  In  / (2n  +  \) .  Using  this  expression,  the 

expected  value  and  variance  are  obtained: 

^m=E[Tm}  =  E[fn]  =  E[Tm]  +  ^\2n  +  l^^l^ 

L  J  In  + 1  3 

_2n{2n  +  \)  <  In  _4»(»  +  l) 

K  ’  3  3  3 

(2n  +  2\ 

~  Mno  ’ 

y  2n  +  1 ) 
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(3.33) 


=Var[fm]  =  £[var[fm  |<5,]]  + Var[£[fm \S,]\ 

«(« -l)(2n +  7)  2  n  «(n +  l)(2n +  3) 

“  45  +_9~_  45 

2  (« +  l)(2n +  3) 

N0  (n  —  l) (2/7  + 1) 

In  standardized  terms,  the  correction  made  by  adding  an  observation  to  the  even  case 
becomes  negligible  as  the  sample  size  increases  and  does  not  affect  the  asymptotic 
nonnality  argument  of  the  previous  section. 

6.  On  the  Consistency  of  TN 

Henze  and  Penrose  (1999)  establish  that  Friedman  and  Rafsky’s  minimum 
spanning  tree  test  is  consistent  against  all  alternatives  for  the  two  sample  case 
Xl,...,Xm~F,  Xm+l,...,Xm+n  ~  G ,  H0:F  =  G,  Hy.F^G,  F  and  G  unknown. 

Rosenbaum  (2005)  argues  for  the  consistency  of  his  cross-match  statistic  Mc  distributed 
as  in  (2.24)  and  (2.25)  against  alternatives  of  the  form 
Xi , . . . , Xm  ~  F  =*=  G  ~  Xm+1  ,...,Xm+n  by  showing  that  it  is  a  consistent  test  for  comparing 

two  discrete  distributions  with  finitely  many  mass  points.  He  heuristically  extends  that 
consistency  argument  to  the  general  case  by  the  fact  that  any  two  distributions  may  be 
approximated  arbitrarily  closely  by  two  discrete  distributions  with  finitely  many  mass 
points.  We  do  not  try  to  formalize  that  argument  here;  rather  we  theoretically  motivate 
the  use  of  the  SPM  test  under  alternative  hypotheses  by  proving  a  consistency  result  for 
Tn  that  depends  on  the  consistency  of  Mc  : 

Proposition  3-1:  TN  is  consistent  against  all  alternatives  of  the  form 

(3.34)  Xx,...,Xm_x~F^G~Xm,...,Xn  3me{2,...,N} 

against  which  Mc  is  consistent,  where  Mc  is  Rosenbaum’s  cross-match  statistic  as 
defined  in  (2.24). 

We  prove  the  case  for  even  N  and  a  change  point  that  divides  the  observations 
into  two  subsets  each  containing  an  even  number  of  elements;  the  same  reasoning 
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extends  to  odd  cases.  Let  N  =  2  n ,  suppose  Mx  + 1  is  a  change  point,  M]  and 


M,  =  N  —  Mx  are  both  even,  and  Mx  /  N  — >  tt,  constant  as  A  — >  oo  .  Adopt  the  subscript 

notation  “0”  or  “1”  to  denote  probabilities  (or  expectations  or  variances)  taken  under  the 
null  or  alternative  hypothesis,  respectively.  We  will  show  that 


‘  /A 


< 


1  as  N  — >  oo  for  any  significance  level  a ,  where  za  denotes  that 


a- 


quantile  of  the  standard  normal  distribution.  We  build  our  proof  on  the  following  fact: 


Lemma  3-3:  Let  TN,k  denote  the  random  variable  obtained  by  matching  among 
the  first  Mx  points  alone  (call  that  set  of  pairs  Px  ),  matching  among  the  last  M,  points 
alone  (call  that  set  of  pairs  P2  ),  randomly  choosing  k  pairs  in  Px  and  k  pairs  in  P2  ( k  is 
even  for  this  case  since  MX,M2,  and  N  are  all  even),  and  randomly  swapping  one 
element  from  each  selected  pair  in  Px  with  one  element  from  each  selected  pair  in  P 
(each  pair  gets  one  swap),  and  finally  computing  the  pair-maxima  sum  on  the  new  pairs. 
Let  A/  be  the  change  in  the  pair-maxima  sum  due  to  the  /lh  swap,  i  =  \,...,k.  Then 
under  null  or  alternative  hypotheses, 

(3.35)  TNk  is  distributed  the  same  as  TN  conditional  on  Mc  =  2k 


and 

(3.36) 


Nk 


T  +T 

±  M.  1  ±  M 


MxM2 


where  k  e  {0,...,min(M1,M2)/ 2|,  the  A.  are  exchangeable,  and  each  A.  is 
independent  of  Mc  . 

Proof  of  Lemma  3-3:  That  T^k  is  distributed  the  same  as  T,  conditional  on 

Mc  =  2k  is  a  consequence  of  all  within  group  matchings  being  equally  likely  and  all 
cross-group  matchings  being  equally  likely.  Each  swap  constitutes  two  cross-matches. 
M  M  k 

Tn k  ~  TMi  +  TMi  H - - — 1  A.  results  from  the  fact  that  by  the  construction  of  TN k , 

2  i=x 
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before  any  swaps  occur  the  total  pair-maxima  sum  is  TM  +  TM  +  Mx  ,  where  ZM]  is 

Mn 

the  pair-maxima  sum  over  Px  and  TM  +  Mx  is  the  pair-maxima  sum  over  P2  .  After 

k 

the  swaps  the  final  pair-maxima  sum  increases  by  a  total  ^  A.  relative  to  the  pre-swap 

!= 1 

sum.  Exchangeability  and  independence  are  by  construction  of  TN  k .  m 
Proof  of  Proposition  3-1:  By  (3.36), 

£«[J’„]  =  £„[£o[j;|Mc]]  =  £t  E„  T^+T^+M^  +  Y^A,  Mc  =  2k 

(3.37)  '=1 


=  E„[TM]  +  Ea\Tu,\  +  ^+EP^Ea[At\. 


Now  solve  for  E0  [A,]  directly  by  substitution  using  (2.27)  and  (3.6): 

(3.38) 


E0  A,  = 


2(£o  Tn  E0  Tm i  E0  Tm 2  j  MjM2 

£„K] 

2  ( N  (N  + 1)  -  Mx  (Mx  + 1)  -  M2  (M2  + 1))  —  3 MxM2 
3  MxM2 


(77-1) 


2 ( N( N  + 1) - irxN(nxN  + 1) - (l - ttx ) JV((l - tt, ) N  + 1)) - 3tt,  (1  -nx) N: 

37Tj  (1  —  TTj  )  TV2 


-(*-!) 


N- 1 


Alternately,  E0  [A,  ]  can  be  found  more  directly  by  noting  that  with  every  swap  the  pair- 
maxima  value  from  Px  gets  replaced  by  the  pair-minima  value  from  P2  .  Denoting  the 
pair  minima  of  the  first  swapped  pair  in  P2  as  UV2  and  the  pair  maxima  of  the  first 


swapped  pair  in  Px  as  Yx.x ,  we  have 


(3.39)  E0  Aj  =E0  UX2  —  Yxl  =  Mx  + 


(M2+l)]  2(Mx+\)_N-\ 
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Since  equation  (3.36)  holds  under  both  null  and  alternative  hypotheses,  we 
proceed  as  above  to  find  an  expression  for  Ex  [fv] : 


Tn  —  Ex 


Ex  Tn 


=  £, 


M  k 

Tu+Tm,  + 

^  1=1 


Mc'  =  2k 


(3.40) 

=  E0 

Tu] 

+  E0 

=  E0 

T«] 

+  E0 

In 

MM. 

1  1  2  1 

e\mc' 

1  1 

2 

2 

MM. 

1  1  2  1 

(N-l) 

1  1 

2 

,  6 

E«\\ 


Ex  ML  , 


so 


(3-41  )EX  Tn\-EATn\  = 


N-l] 


M 


c ' 


-EAMC\\  =  N 


'N-l' 

Mc 

~E0 

Mc 

.  6  , 

N 

Rosenbaum  argues  that 
sufficiently  large  N 


e1[mc\-e0\mc 

N 


6  <  0  as  N  — >  oo ;  therefore,  for 


,,  _N(N-l)(Sn)  &N(N- 1)« 

(3.42) — — - — -< — - — - -  --  , — >  —  oo  as  N  — ►  oo 


cr. 


6cr. 


JN(N-2)(N  +  1) 


again  using  (3.6).  Now  we  need  one  final  lemma  to  complete  our  proof: 
Lemma  3-4: 

VarJrJ 


(3.43) 


Varo  tn 


0(1) 


where  0(  ■)  is  the  standard  Landau  notation. 


Proof  of  Lemma  3-4:  We  rely  on  two  facts  that  follow  directly  from 
Rosenbaum’s  argument  for  the  consistency  for  his  cross-match  statistic;  namely, 

Ex  [Mc  |  =  0(N)  and  Vap  |mc  |  =  0[N).  First,  expand  Vaq  [fv]  by  conditioning: 

(3.44)  Varl\TN\  =  Varl\El\TN\Mc"  '  17  "T~~]rr  1 


+  EX  Yap  Tn  M 


Now  express  each  term  on  the  right  hand  side  using  (3.36).  The  first  tenn  is  simply 
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Var, 


e\tk  |  M‘ 


(3.45) 


=  Var. 


=  Vap 


M  k 

tu+t^+m,-l+J2  a, 

^  i= 1 


Mc  =  2k 


E„[TM\  +  Ea\Tu]  +  ^+Ml0(N) 


=  0(V2)Var1[Mt 
=  0  ( V3 ) . 


For  the  second  term,  we  first  note  that  TM  and  TM  are  independent  by  construction,  so 


Var,  Tn  M 


(3.46) 


=  E, 


=  E, 


Var,, 


T„,  +  Tu%  +  M,  +  £  A  J  Mc'  =  2i 


Z=1 


Var0  +Var0  l  +  Vaf 


EA, 


Mc'  =  2k 


+  Covr 


Mc'  =  2£ 


It  is  clear  that  the  exchangeable  A .  s  are  negatively  correlated,  since  A .  =  Ur2  —  Yn  with 
the  Ut.2  negatively  correlated,  the  Yrl  negatively  correlated,  and  the  U r2  independent  of 


the  F , ’s.  Therefore, 


Var„ 


EA 


i= 1 


Mc  =  2k 


^Var0[A,.|Mc  =2£]  +  ^Cov, 


1=1 


(3-47) 


M 


cJ 


Var0  [Ajd 


Mc'  [mc'  -  lj 


A,,  A  j  Mc'  =2k 


Cov0  Aj,A2 


< 


Mc 


Var0  UV2-YVX  =  M  OlN  \ . 


Furthermore,  the  A,  s  can  be  seen  to  be  negatively  correlated  with  Tu  and  TM  ,  since 
larger  TM^  values  are  associated  with  larger  F ,  values  and  larger  TXI  values  are 
associated  with  smaller  U..2  values.  So, 
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Ex 


Var,  Tn  M1 


<EX 


Mc/  =  2k 


(3.48) 


=  E, 


I  Var0 1  rM|  |  +  Var0 1  TMi  J  +  Var0  A,. 

i= 1 

[<9(V3)  +  McO(V2) 

=  C>(v3). 

Combining  (3.45)  and  (3.48)  results  in  Var  [7^]  =  0(a3)  ,  and  so 
Varj  [Tn\/ Var0  [Tn ]  =  0( l)  as  claimed. 

Finally,  to  conclude  the  proof  of  our  consistency  proposition,  we  apply 
Chebyshev's  inequality.  Choose  any  real  number  s  >  0  and  any  significance  level  a . 
Then 

4>  (K  M  >  ^Va4r»])  >  p{t„ -Et  [r„]  >  % 


(3.49) 


=  R 


>P, 


Tn  —E0  [rw]  >  s^j\avl\TN}+El  [ Tn}-E0  [Tw] 


T  —  F  T 

1N  ^0  1N 


> 


VVaro  [r(v] 

for  sufficiently  large  V, 


VVaro[r»] 

since  Var,  [rv ]  /  Vap  [Tv  ]  =  O (l)  by  Lemma  3-4  and  (E]  [Tn ]  —  E0  [Tn ])  /  < 

N —>  oo  by  (3.42).  The  inequality  (3.49)  is  true  for  any  s>0,  so 


-oo  as 


En  Rn 


<  z„ 


CTa 


1  as  N  —3  oo  and  the  proposition  holds. 


7.  A  Graphical  Example 

We  give  a  graphical  example  to  illustrate  how  the  Sum  of  Pair-Maxima  test 
works.  Consider  the  same  set  of  20  points  drawn  from  a  bivariate  normal  distribution 
used  for  illustration  in  Chapter  II  (see  Appendix  D  for  data  values).  Now  suppose  that 
associated  with  each  observation  is  some  sequence  label,  for  example,  the  time  order  of 
the  observation.  Figures  6  and  7  show  two  such  cases,  where  the  plot  symbol  for  each 
data  point  is  its  sequence  label,  and  the  data  are  paired  with  respect  to  the  optimal  non- 
bipartite  matching  on  the  set  (compare  to  Figure  3  in  the  previous  chapter,  which  is  the 
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same  plot  except  without  sequence  labels).  To  represent  a  sample  whose  underlying 
distribution  has  not  changed  with  respect  to  sequence  label,  the  sequence  labels  are 
assigned  at  random  in  Figure  6.  To  represent  a  sample  whose  underlying  distribution  has 
changed  with  respect  to  sequence  label,  the  sequence  labels  are  assigned  from  lower-left 
to  upper-right  in  Figure  7. 


Figure  6.  Minimum  weight  non-bipartite  matching  on  20  points  with  no 
change  in  underlying  distribution  with  respect  to  observation  order. 

Now  we  compute  the  SPM  test  statistic,  T20 ,  for  each  case  and  consider  whether 
or  not  this  test  rejects  the  null  hypothesis  at  significance  level  a  =  0.05  .  For  N  —  20  , 
the  quantile  table  in  Appendix  B  gives  T^'1  =  129 .  In  the  case  of  Figure  6,  one  readily 

computes  T20  =15  +  19-H8  +  ll  +  20-|-17-|-8-|-5-t-14-|-16  =  143>129  =  r2c0nt,  and  so 
the  null  hypothesis  is  not  rejected.  In  the  case  of  Figure  7, 
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T20  =2  +  4  +  6  +  10  +  ll  +  13  +  17  +  16  +  18  +  20  =  117<129  =  r2c0r\  so  the  null 
hypothesis  is  rejected  and  we  conclude  that  the  underlying  distribution  is  changing  with 
respect  to  observation  order.  The  nature  of  change  in  this  example  is  perhaps  extreme, 
but  it  serves  to  illustrate  the  sense  of  the  TN  statistic  as  a  change-point  test. 


Figure  7.  Minimum  weight  non-bipartite  matching  on  20  points  with  a 
change  in  underlying  distribution  with  respect  to  observation  order. 

We  more  thoroughly  examine  the  performance  of  the  SPM  test  by  simulation 
study  in  Chapter  IV.  As  one  might  expect,  while  this  nonparametric  test  has  a  fixed  false 
alarm  rate  regardless  of  underlying  distribution,  its  power  is  somewhat  low.  We  find  that 
we  can  dramatically  increase  the  power  of  this  test  by  considering  a  particular  ensemble 
of  such  matching  statistics,  and  still  retain  a  fixed  false  alarm  rate.  Before  we  present  the 
theory  for  such  an  ensemble  statistic  (in  Section  C  of  this  chapter),  we  first  introduce  an 
alternative  to  the  SPM  test  that  is  also  based  on  non-bipartite  matching. 
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B. 


THE  NON-BIPARTITE  ACCUMULATED  PAIRS  TEST 


1.  The  Mw  Statistic 


The  cross-match  test  statistic  proposed  by  Rosenbaum  (2005)  is  used  when  there 
is  a  clear  delineation  between  two  groups  in  a  sample  (for  example,  treatment  and 
control)  and  the  objective  is  to  test  whether  the  two  groups  come  from  the  same 
probability  distribution.  After  performing  an  optimal  non-bipartite  matching,  one  counts 
the  number  of  pairs  that  link  across  the  two  predetermined  groups.  Under  the  null 
hypothesis  of  homogeneity  this  test  statistic  has  a  simple,  exact  distribution  (see  (2.24) 
and  (2.25))  that  can  be  approximated  by  a  normal  distribution  in  large  samples.  We  now 
consider  an  extension  of  Rosenbaum’s  test  to  the  situation  where  a  sample  is  ordered 
sequentially  but  no  prior  subdivision  of  the  observation  set  exists. 

As  before,  suppose  we  have  N  =  2 n  observations  and  an  associated  optimal  non- 
bipartite  matching  M  ={([/1,T1),...,(C/„,T„)}.  Let  MkN  =  {(Ui,Yi)\Yi  <k;i  = 

denote  the  number  of  pairings  in  a  non-bipartite  matching  that  occur  within  the  first  k 
observations,  2  <  k  <  N  - 1 .  The  cross-match  statistic  Mc  is  equivalent  to  Mk  N ,  and  so 
an  exact  expression  for  the  probability  mass  function  of  Mk  N  under  the  null  hypothesis 
follows  directly  from  (2.24): 


where  P^Mk  N  =  r)  =  P(^MC  =k-2r^J  for  observation  subsets  of  size  k  and  N  —  k. 
Observe  that  M k  N  can  be  expressed  in  terms  of  the  matched-pair  maxima  Y.  by  noting 
that  the  event  M  k  N>  r  is  identical  to  the  event  j  j  |  Yj  <  k  j  >  r  ,  which  in  turn  is  identical 
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to  the  event  Y(r+^<k  where  Y{/]  is  the  jth  largest  among  the  matched-pair  maxima 
(taking  =N  +  \  for  r>n  ).  Thus,  large  values  Mk  N  are  associated  with  small  values 

of  T  and  vice-versa. 

Rosenbaum  uses  the  number  of  cross-matches  as  a  test  statistic,  rejecting  the  null 
hypothesis  of  homogeneity  for  small  values.  The  number  of  within-group  matches  in  any 
one  of  the  two  groups,  Mk  v,  is  an  equivalent  test  statistic,  with  large  values  indicating 

evidence  against  the  null  hypothesis.  We  call  the  vector  Mw  =  {M2N,...,Mn_  1Ar)  the 
Non-Bipartite  Accumulated  Pairs  (NAP)  test  statistic.  Like  TN ,  a  test  based  on  M  v 
rejects  the  null  hypothesis  for  small  values  of  Y. . 

2.  Critical  Envelope 

It  is  possible  to  develop  an  exact  simultaneous  test  based  on  for  cases  where 
the  change  point  k  is  not  pre-specified.  To  do  this  we  seek  a  vector  of  non-negative 

integers  qw  a  =  [qk(j  N,. . . ,  qk<  N  j  (where  we  omit  the  test  level  subscript  on  the  right-hand 
side  for  ease  of  notation  below)  so  that  the  following  is  true  for  a  given  test  level  a  : 

(3.51)  P[MkN<qkN,  k0<k<kx)>\-a. 

We  choose  qkN  to  be  the  1  -a  quantile  of  the  distribution  of  MkJi  so  that  the  non- 
simultaneous  test  at  stage  k  has  level  a ;  that  is, 
P(Mk  ;V  >  qk  N  for  some  k0  <  k  <  kx )  <  a  .  The  problem,  then,  is  how  to  select  a  so  that 
the  simultaneous  test  level  comes  as  close  to  a  as  possible  without  exceeding  this  value. 

To  find  P[Mk  N  <  qk  N,  k0  <  k  <  kx ) ,  we  develop  a  recursive  computational 
scheme  based  on  the  fact  that 

qh,N 

(3.52)  P(MkN<qkN,  kQ<k<kl)=Yj7r{r,kx,N)-P{MhN  =  r), 
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where  7r(r;k,N)  =  P^Mj  N  <qjN,  k0  <  j  <k-\\MkN  =  r),  and  that  7r{r,k,N )  has  a 
recursive  fonn  stated  in  the  following  lemma. 

Lemma  3-5: 

x(r;k,N)  =  ^n(r-l;k-l,N)l(r-l<qk_1N) 

K 

(3.53)  +^^n(r-,k-\,N)l(r<qk_XN), 

k  =  k0+l,...,kl;r  =  Ov(k-n),...,qkN  , 

where  /  ( • )  is  the  indicator  function. 

Proof  of  Lemma  3-5:  Expand  n{r\k,N )  as  follows: 

(3.54) 

n(r\k,N ) 

L(*-i)/2j 

=  X  P (MJ,n  <quNA^J^k-2\M k_lN  =s,MkN=r)-P[M k_UN  =s\MkN=r) 

s=0v(k-\-N) 

L(*-l)/2j 

=  X  j,N  ~  qj,N’K  -  j  -  k  ~2  \  Mk_lN  =  S\p{^k-\,N  =  5  I  3^tiAr  =  r) 

s=0v(A'-l-V) 

=  X  =j|Mt>JV  =  r)-/(s<^_1JV) 

i=0v(A--l-V) 

=  7r{r-\]k-\,N)-P{Mk_XN=r-\\MkN  =r)-l[r-\<qk^N) 

+  7r(r;k-\,N)-P{Mk_1N=r\MkN=r)-l{r<qk_lN) 
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Finally,  we  note  that 

(3.55) 

P{Mk-X,N=r-l\Mk,N=r) 

N 

=  r  —  l,MktN  =  r|item  k  matched  to  s') -P  (item  k  matched  to  .s') 

_  ^=1 _ 

P(MlJ,=r) 

k- 1 

X P [Mk_x  N  =  r  - 1 1  item  k  matched  to  s)  •  P (item  k  matched  to  s) 

_  _s= 1 _ 

P(Mu,=r) 

_  A-  -  I  V ;  ~''~Q  _  2r 

P^Mk  N  =  r) 

to  complete  the  proof.  ■ 

To  start  the  recursion  take  7r[r;k0,N)  =  1,  Vr .  Let 

(3.56)  b(r;k,N)  =  7r(r;k,N)l(r<qkN ) 
and 

(3.57)  (V;k,A)  =  b(r,k,  N^-b(r  -V,k, A) 

From  (3.53)  we  then  have 

(3.58)  n{r\k,N )  =  b{r\k  -1, A)-  —  Ah  (r\k-\, A) 

k 

which  is  suitable  for  efficient  implementation  in  S-PLUS®  (2005),  R  (2005), 
MATLAB®  (2008),  or  other  interpreted  languages.  There  is  no  need  for  asymptotic 
approximations;  finding  an  exact  critical  region  can  be  done  quickly  at  any  practical 
sample  size  using  trial  and  error.  An  implementation  for  R  is  included  in  Appendix  C. 
We  use  the  fact  here  that  all  information  carried  through  conditioning  to  the  joint  events 
Mk_ lN  =s,MkN  =  r  is  carried  through  the  single  event  M t  ,  iV  =5,  and  that  the  event 

Mk  N  =  r  implies  either  Mk  ,  N  =  r  or  M k  ,  N  =  r  - 1 . 
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Using  the  exact  method  presented  here  is  far  less  conservative  than  the  Bonferroni 
method.  At  sample  size  N  =  20  a  nominal  a  =  .05  simultaneous  test  achieves  test  level 
.046  using  a  =  .021  while  the  Bonferroni  test  achieves  level  .0044  using  a  =  .0028  (.05 
divided  by  18  based  on  k0  =  2  and  kx  =  19).  Sample  size  N  =  100  achieves  level  .048 
using  a  =  .0046  while  the  Bonferroni  test  achieves  level  .006  using  a  =  .0005  (.05 
divided  by  98  based  on  k0  =  2  and  k}  =  99  ). 

As  an  illustration,  Figure  8  shows  two  cases  for  N  =  100  at  significance  level 
a  =  0.05  .  The  solid  line  is  the  critical  envelope  qw  a  obtained  by  recursion  using  (3.5 1)- 
(3.58).  The  null  hypothesis  case  (homogeneity)  is  modeled  by  drawing  all  100  points 

from  a  bivariate  normal  distribution  BVN(p0,E)  where  p0  =  (0,0)  and  E  is  the 
identity  matrix.  The  alternate  hypothesis  case  (heterogeneity)  is  modeled  by  drawing  the 
first  50  points  from  BVN  (p0,E)  and  the  last  50  points  from  BVN(p,1,E)  where 

p  =  (3,0)  .  We  choose  a  large  mean  jump  for  this  example  to  emphasize  the  response  of 
Mw  to  a  change  point.  Applying  the  NAP  test,  we  reject  the  null  hypothesis  for  any  case 
where  Mk  N>  qkN.  In  this  example,  M  v  for  the  case  of  homogeneity  never  exceeds 
qN  a  and  so  we  do  not  reject  the  null  hypothesis.  In  contrast,  M  v  for  the  case  of 
heterogeneity  exceeds  q  v  (/  not  just  once  but  for  numerous  values  of  k,  so  ample 
evidence  exists  to  reject  the  null  hypothesis. 
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Critical  envelope  and  example  MN  statistics  {N=  100,  a=  0.05) 


Figure  8.  Critical  envelope  for  the  NAP  test  and  two  cases  of  Mw  with 
TV  =  200  and  a  —  0.05 .  Reject  the  null  hypothesis  if  Mw  exceeds  q“ 
for  some  k . 

3.  A  Graphical  Example 

As  we  did  for  the  SPM  test,  we  illustrate  the  mechanics  of  the  NAP  test  with  the 
data  presented  in  Figures  6  and  7.  We  compute  the  NAP  test  statistic  M20  for 
A:  =  2,...,  19  and  consider  whether  or  not  this  test  rejects  the  null  hypothesis  at 
significance  level  a  —  0.05  .  Figure  9  shows  critical  envelope  q20  a  and  M20  for  the  data 
from  Figure  6.  We  find  the  Mk  20  values  easily  by  visual  inspection:  there  are  no  pairings 

that  occur  within  the  first  2  observations,  none  in  the  first  3  observations,  none  in  the  first 
4  observations,  1  pairing  within  the  first  5  observations,  and  so  on. 
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k 


Figure  9.  NAP  test  statistic  for  Figure  6  data,  no  change  point  detected. 

Since  M k  20  <  qk  20  V&  in  this  case,  we  do  not  reject  the  null  hypothesis.  In 
contrast,  Figure  10  shows  q20a  and  test  statistic  M20  for  the  data  from  Figure  7.  Here 
we  see  that  Mk  20  >  qk  20  for  k  =  4,6,  and  11.  A  single  exceedance  alone  is  sufficient  to 
reject  the  null  hypothesis,  so  in  this  case  there  is  more  than  enough  evidence  to  do  so. 
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Figure  10.  NAP  test  statistic  for  Figure  7  data,  change  point  detected. 

We  note  here  at  least  two  potential  advantages  of  the  NAP  test  over  the  SPM  test. 
First,  by  design,  the  NAP  test  allows  one  to  narrow  the  window  of  the  test  for  possible 
change  points,  which  the  SPM  test  does  not  allow.  In  other  words,  if  there  is  prior 
information  that  allows  a  possible  change  point  to  be  restricted  to  a  subinterval  [k0,k^ 

with  k0>2  and  kx<N ,  then  the  critical  envelope  calculation  is  adjusted  accordingly.  A 

second  advantage  is  that  the  NAP  test  gives  information  regarding  not  only  if  a  change 
point  exists,  but  also  when  it  occurred.  For  any  case  where  at  least  one  exceedance 
exists,  let  k  =min| k:MkN  >qkN\ .  We  expect  that  earlier  change  points  would  be 

identified  by  smaller  values  of  k* .  We  examine  the  performance  of  the  NAP  test  in  the 
simulation  study  presented  in  Chapter  IV. 
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c. 


THE  ENSEMBLE  SUM  OF  PAIR-MAXIMA  TEST 


The  minimum  cost  assignment  obtained  by  optimal  non-bipartite  weighted 
matching  is  associated  with  a  random  sample;  therefore,  the  assignment  is  optimal  only  to 
the  specific  data  at  hand.  Another  sample  with  the  same  underlying  distribution(s)  would 
almost  certainly  result  in  a  different  matching  with  respect  to  sequence  labels.  It  is 
natural  then  to  examine  sub-optimal  (but  good)  matchings  for  additional  infonnation 
regarding  homogeneity,  and  evaluate  whether  the  information  in  this  ensemble  of 
matchings  yields  greater  power  to  detect  whether  a  distribution  change  has  occurred.  In 
particular,  we  consider  collections  of  matchings  that  are  orthogonal,  meaning  they  share 
no  common  pair  (this  is  similar  to  the  approach  of  Friedman  and  Rafsky  (1979)  where 
they  examine  orthogonal  minimal  spanning  trees).  We  discuss  a  few  properties  of  such 
collections  as  background  and  then  introduce  a  test  statistic  based  on  collections  of 
orthogonal  matchings. 

1.  Orthogonal  Successive  Optimal  Matchings 

We  use  the  term  orthogonal  successive  optimal  matchings  to  refer  to  matchings 
constructed  by  the  following  process:  compute  an  optimal  non-bipartite  matching  on  the 
original  data,  then  the  next  best  matching  that  is  orthogonal  to  the  first,  then  the  next  best 
matching  that  is  orthogonal  to  the  first  and  second,  and  so  on.  Given  N  —  2 n 
observations  (we  assume  N  even  as  before  to  simplify  exposition)  and  some  associated 
cost  function,  orthogonal  successive  optimal  matchings  have  the  following  properties. 

Property  1:  At  least  N  /  2  matchings  may  be  obtained  by  orthogonal  successive 
optimal  matching. 

Proof  of  Property  1 :  The  following  lemma  follows  directly  from  Theorem  6.6  of 
Chartrand  and  Zhang  (2005):  “If  a  graph  G  has  N  —  2 n  vertices  and  each  vertex  has 
degree  at  least  n,  then  G  has  a  perfect  matching.”  Let  G0  —  (V,E0)  be  the  original  graph 

on  N  vertices  and  N(N  —  l)  /  2  edges  and  let  G,  =  (V,  £.)  denote  the  subgraph  whose 
edge  set  Ei  C  E0  consists  of  those  edges  that  have  not  been  utilized  in  matchings  1 ,...,/. 
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At  the  beginning  of  the  matching  process,  each  vertex  in  G0  has  degree  N  —  1 .  After  the 
first  matching  N  /  2  edges  are  removed  from  G0  with  each  vertex  incident  to  exactly 
one  such  edge,  so  each  vertex  in  Gx  has  degree  N  —  2  and  a  perfect  matching  exists  on 
Gj  by  the  lemma.  After  N  / 2  —  1  orthogonal  successive  optimal  matchings  have  been 
computed,  Gv/2_,  has  degree  (N  —  l)  —  [N/2  —  i)  =  N/2  =  n,  so  at  least  one  more 
matching  exists  by  the  lemma.  ■ 

Property  2:  At  most  N  —  1  matchings  may  be  obtained  by  orthogonal  successive 
optimal  matching. 

Proof  of  Property  2:  From  the  discussion  in  Property  1,  each  vertex  in  graph  Gf 
has  degree  N  —  1  —  i .  Therefore,  GN_X  has  no  edges,  and  no  more  successive  matchings 
exist.  ■ 

The  bounds  associated  with  Properties  1  and  2  are  both  strong  bounds  in  the  sense 
that  there  exist  cases  where  no  more  than  N/2  matchings  may  be  obtained  by 
orthogonal  successive  optimal  matching,  and  also  cases  where  exactly  N  —  1  orthogonal 
matchings  may  be  obtained.  The  following  examples  demonstrate  each  case. 

Example  for  which  no  more  than  N/2  matchings  may  be  obtained:  Let  N  —  6 , 
and  consider  a  regular  hexahedron  under  a  Euclidean  cost  function  in  three  dimensions 
(that  is,  a  polyhedron  with  6  triangular  faces  and  all  edges  of  equal  length).  Label  its  five 
vertices  as  shown  in  Figure  11. 


1 


Figure  11.  Example  for  which  no  more  than  N/2  matchings  may  be  obtained. 
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Without  loss  of  generality,  assume  each  edge  length  of  this  shape  equals  1 .  Now  insert  a 
sixth  point  in  the  center  of  this  shape  (that  is,  at  the  midpoint  of  the  segment  connecting 
the  vertices  1  and  5)  and  edges  to  connect  it  to  all  other  five  vertices.  The  resulting  cost 
matrix  is  given  in  Table  2: 


Ctj  1  2  3  4  5  6 

1 

2 

3 

4 

5 

6 

Table  2.  Cost  matrix  for  the  regular  hexahedron  in  Figure  11. 

It  is  quickly  verified  that  an  optimal  matching  in  this  case  pairs  vertices  1,5,  and 
6  with  any  one  of  vertices  2,  3,  and  4  so  as  to  make  a  matching.  Regardless  of 
tiebreaking  procedure,  the  first  three  successive  matchings  exhaust  all  pairings  of  1,  5, 
and  6  with  2,  3,  and  4.  To  obtain  a  fourth  matching,  vertices  2,  3,  and  4  may  only  be 
paired  among  each  other,  in  which  case  no  partner  exists  for  the  third.  Therefore,  no 
more  than  the  N  /  2  orthogonal  successive  optimal  matchings  guaranteed  by  Property  1 
can  be  constructed. 

Example  for  which  exactly  N  -  1  matchings  may  be  obtained:  Let  N  =  4 ,  and 
consider  a  square  under  a  Euclidean  cost  function  with  its  vertices  labeled  as  shown  in 
Figure  12: 


1  4 


Figure  12.  Example  for  which  exactly  N  -  1  matchings  may  be  obtained. 
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By  inspection  it  is  clear  that,  in  order,  {{l,2},{3,4}}  ,  {{l,4},{2,3}} ,  {{1,3}, {2, 4}}  is 
an  example  of  N  —  \  —  3  orthogonal  successive  optimal  matchings. 

In  fact,  the  set  of  all  possible  collections  of  N  —  1  orthogonal  successive  optimal 
matchings  is  isomorphic  to  the  set  of  all  symmetric  Latin  squares  of  order  N  with  the 
property  that  the  integer  N  is  on  the  diagonal.  Recall  that  a  Latin  square  of  order  N  is 
an  array  consisting  of  the  integers  {l,...,7V}  such  that  each  integer  occurs  exactly  once  in 
each  row  and  once  in  each  column.  The  isomorphism  may  be  described  as  follows. 
Denote  the  N  —  l  orthogonal  successive  optimal  matchings  by  {/W,,...,  M  v_,| ,  where 

Mj  ={K/’Tiy  ),•••,  (u„j>  ynJ  j  j  is  the /h  matching  and  ujj  and  y..  are  the  minimum  and 

maximum  sequence  labels,  respectively,  of  the  z'th  pair  listed  in  the /h  matching.  Enter  the 
integer  j  in  an  NxN  array  at  entries  (uij,yij  'j  and  (_y.y.,w.;.)  for  all  z  e  {l, ...,«}  and  all 

j  G  {l,...,7V  —  1} ,  and  enter  N  on  the  array  diagonal.  The  resulting  Latin  square  carries 

the  information  indicating  the  stage  in  the  succession  of  matchings  at  which  the 
observation  whose  sequence  number  corresponds  to  row  (column)  a  is  paired  with  the 
observation  whose  sequence  number  corresponds  to  column  (row)  b,  a  ^  b .  We 
illustrate  the  isomorphism  with  a  very  simple  example  on  N  =  6  observations  in 
Figure  13. 

5  successive  matchings 

M  1=  {{1,2},  {3, 5},  {4, 6}} 

M  2=  {{1,3}, {2, 6}, {4, 5}} 

M  3=  {{1,4}, {2, 3}, {5, 6}} 

M  4=  {{1,5}, {2, 4}, {3, 6}} 

M  5=  {{1,6}, {2, 5}, {3, 4}} 

Figure  13.  Five  orthogonal  successive  optimal  matchings  on  N  —  6 
observations  with  associated  Latin  square.  Cost  function  does  not 
necessarily  satisfy  the  triangle  inequality. 
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In  our  research,  most  random  samples  we  observed  of  size  N  admitted  N  —  1 
orthogonal  successive  optimal  matchings.  However,  the  fact  that  this  is  not  always  the 
case  requires  that  our  theory  accommodate  cases  where  less  than  N  —  l  orthogonal 
successive  optimal  matchings  are  admitted. 

2.  The  Kn  Statistic 


We  proceed  to  fonnulate  a  test  statistic  based  on  orthogonal  successive  optimal 
matchings.  Let  TN  .  denote  the  sum  of  pair-maxima  statistic  associated  with  the  ith  best 

orthogonal  matching.  It  is  straightforward  to  show  that  TN  .  is  marginally  distributed  as 

Tn  .  We  prove  this  for  the  case  of  continuous  random  variables. 


Proof:  Let  be  some  standard  ordering  of  observations 


[Xl,...,XN')  based  solely  on  data  content  (e.g.,  an  ordering  based  on  observation  norm) 


and  let  C  be  the  cost  matrix  with  respect  to  this  standard  ordering.  Let  V  denote  the  set 
of  all  NxN  0-1  matrices  associated  with  perfect  matchings  on  N  observations;  that  is, 
V  is  the  set  of  all  symmetric  matrices  that  have  a  “1”  entry  for  pairings  in  a  matching 
and  “0”  otherwise.  Two  matchings  U,V  E  'V  are  orthogonal  if  and  only  if  U  oV  —  0, 
where  “  o  ”  is  the  Hadamard  (coordinate -wise)  product  of  U  and  V  .  Then  the  optimal 
non-bipartite  matching  problem  can  be  expressed  as 


(3.59) 


mintr(CF) 
subject  to  V  e  V , 

where  tr  (A)  denotes  the  trace  of  matrix  A.  Now  let  V*  be  the  solution  to  (3.59),  and 


define  nested  sets  orthogonal  solutions  V*_  ,V3*  ,  and 


objective  values  z*,z*2,...,z*  for  i  e  {l, . . . , N / 2}  recursively  as  follows: 
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V*  solves  z*  =  min  tr  (CF) 

subject  to  F  e  "Vj, 

F2*  solves  z2  =  min  tr  (CV ) 

(3.60)  subject  to  F  G =  {F  G  V :  F  o  F(*  =  o}, 

V*  solves  z*  =  min  tr  (CF), 

subject  to  F  G  'Vj  =  {f  G  V:  Fo  F;*  =  0  V  j  G  {l,2,...,i  — l}}. 

For  continuous  random  variables,  z*  <z\<-<z't  with  probability  1 . 

Now,  for  any  pennutation  tt  applied  to  the  integers  {l,...,/V},  let  Av  denote  the 

associated  NxN  permutation  matrix.  If  the  order  of  [x iN^  is  permuted  by  tt  , 

then  the  cost  matrix  for  the  pennuted  observations  is  A^CA^ .  Define  nested  sets 

Xi=/VD'X2D-o\j,  orthogonal  solutions  V*a , V*3 V*4 ,  and  objective  values 

z*  ,,z*  2,...,z* ;  for  i  G  {l, . . . ,  JV7  2}  recursively  like  before: 

V*x  solves  z*  j  =  min  tr  (CnV) 

subject  to  F  e  If  j , 

(3.61)  ; 

Kj  solves  zl,i  =  min  tr  (QF ) , 

subject  to  F  G  If  ,.  =  |F  G  "V :  F o  Fjy  =  0  V/  G  {l, 2, —  1} j. 
Note  that  tr (C_F)  =  tr^A^CA^F)  =  tr(CA!rFA7r')  for  all  F  G  "V  by  the  permutation 
invariance  of  the  trace  operator. 

We  now  show  by  induction  that  for  any  i,  V*  =A  V*A' .  The  assertion  is  true  for 

J  J  ~  7 T,l  7 r  l  7T 

i  =  l,  since  A  V* A'  G  "V  =  "V, , 

‘  7T  i  7T  7T,1  " 

(3.62)  z* .  =  min  tr  (CF)  =  min  tr  (CA'VA  )  =  min  tr(CC)  =  z* , 
and 

(3.63)  tr(c,(^F1X))  =  tr(CT'1,)  =  z;. 
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(3.64) 


Now  suppose  V*j  =  A^V*  A[  for  all  j  e  {l, . . . ,  k)  ,  k  <  N 1 2  .  Then 

Xw-lreVrKo K.J  =  0  v ) e {1,2,...,*:}} 

=  {reV-.4(r°v;j)A,  =  o  vy e {1,2,...,*}} 

=  [v  <=y  :(4VA,) o (AX A )  =  0  Vj €  {1,2,. . . . t}} 
=  {v e "V :  or;  =  o  v/e {1,2,...,*:}}. 

Therefore,  the  problem 

mintr  (Cj) 

subject  to  V  e  X,t+i 


(3.65) 

has  the  same  solution  as 

(3.66) 


min  tx{CA'jAn) 
subject  to  /tjfX  e  yk+1 , 

and  the  solution  to  (3.66)  is  V  =  AjrFi*+1A^[.  Therefore,  V*k+l  =  A_V*k+[Al ,  and  thus  by 

induction  V*.  =  AV*A'  for  any  i. 

n.l  7T  l  7T  j 


Finally,  let  p  be  the  antirank  vector  for  the  ordered  observations  so  that 
X(j.  =  .  Then  for  any  i,  if  V*  solves  the  itb  orthogonal  matching  problem  with 

respect  to  the  standard  ordering,  then  V*pi  is  the  solution  with  respect  to  the  original 

ordering.  Conditional  on  p  is  uniformly  distributed  over  all  N\  possible 

permutations  applied  to  the  integers  {l,...,7V}  under  the  null  hypothesis.  So,  each 

element  of  y  is  equally  likely  to  be  the  ith  orthogonal  successive  optimal  matching,  and 
thus  each  possible  labeling  of  vertices  in  that  matching  is  equally  likely.  Therefore,  TN  . 

is  marginally  distributed  as  TN  .  m 


By  Property  1  above,  TN  .  is  well-defined  for  all  i  <  N  /  2 .  We  define  SN  k  to  be 
the  cumulative  sum  of  pair-maxima  over  the  first  k  matchings: 

(3-67)  .V=E7«. 

1=1 
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Likewise,  SN  k  is  well-defined  for  all  k  <N  /  2.  The  mean  of  SN  k  under  the  null 
hypothesis  is  computed  directly: 

(3.68)  ftw  •V]  =  E£fcl]  =  *:ftv  = 

;=1 


Just  as  Tn  takes  on  smaller  values  under  alternative  hypotheses  than  under  the 
null  hypothesis,  we  expect  that  under  alternative  hypotheses  SNk  will  tend  to  deviate 
below  its  mean  value.  Therefore,  we  define 


(3.69) 


K,,  —  max 

N  *e{l,2,..,JV/2} 


- S 


N,k 


to  be  our  Ensemble  Sum  of  Pair-Maxima  (ESPM)  test  statistic,  where 

(3.7°) 

is  a  scaling  constant  whose  choice  is  motivated  in  the  following  section.  So,  KN 
measures  the  maximum  cumulative  deviation  of  the  Ts .  from  their  mean  over  N  /  2 
orthogonal  successive  optimal  matchings.  For  convenience  we  choose  KN  to  be  the 
negative  of  the  typical  centered  random  variable  so  that  smaller  values  of  SN  k  (which  are 
evidence  of  an  underlying  distribution  change)  correspond  to  larger  values  of  KN  . 


3.  Brownian  Bridge  Motivation  for  K N 

The  formulation  of  the  KN  statistic  is  based  in  part  on  structural  similarities  of 
the  sequence  {SNl,...,SNN/2J  to  a  Brownian  bridge.  Recall  that  a  stochastic  process 
t>0}  is  called  a  Gaussian  process  if  has  a  multivariate 

nonnal  distribution  for  all  tj )  and  for  all  j  e  {l, 2, . . . j ,  and  that  a  Gaussian  process 

{5(f),0<f<l}  with  =  0,  0<t<l,  and  Cov(5(j),5(t))  =  j(l  —  t), 

0<J<t<l,is  called  a  Brownian  bridge  (Ross,  2003,  pp.  622-623). 
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We  desire  an  expression  for  the  variance  of  SNk  to  compare  the  sequence 
{SN1,...,SNN/2j  to  a  Brownian  bridge.  Such  an  expression  depends  on  the  covariance 
between  the  TNi,  which  is  difficult  to  determine  analytically.  However,  simulation 
suggests  that  Cov  (Tw  ; ,  TN  ;  j  =  Cov  (Tn  , ,  TN  2 )  for  all  i  ^  j ,  so  for  the  remainder  of  this 
section  we  assume  this  to  be  true  for  the  sake  of  comparison  to  a  Brownian  bridge  only. 

Under  the  assumption  that  Cov  (7^. ,  TNJ  j  =  Cov  (tn] ,  TN  2 )  for  all  i  ^  j  ,  it 
follows  that 


(3.71) 


Va,\S„\  =  kal  +  ztEMVg 

i—2  j—\ 

ka2N  +  k{k-\)Co\[TNVTN2)  Mk, 


where  the  underscore-tilde  notation  u q2N k”  indicates  that  equality  depends  on  our 

covariance  assumption.  It  is  straightforward  to  solve  for  Cov  (  7). , ,  TN  2  j  under  this 

assumption  by  observing  that  in  any  case  for  which  N  —  \  orthogonal  successive  optimal 
matchings  can  be  constructed,  every  possible  pairing  of  observations  has  been 
considered.  For  this  case  then, 

(3-72) 


which  is  constant  for  fixed  N.  Therefore  Var\SN  A,_l  ]  =  0.  Applying  this  boundary-value 
condition  to  (3.71)  gives 

(3.73)  0  =  Var  [S w  ]  =  (N  —  l)  a2N  +  ( N  —  l)  (N-  2)  Cov  (TN1 ,  TNa )  V  N , 


so 

(3.74) 


Cov  (7^,7^): 


ol 


(A  — 2) 


7V(7V-2)(7V  +  1) 
2(A-2)90 


TV  (TV  -|- 1) 
180 


V  N . 


Therefore,  we  obtain  the  desired  result 
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?N,i t  — 


(3.75) 


ka2N+k(k-l)Cov(TN1,TN2) 
N(N  +  \) 


=  kai  —k[k  —  l) 

w  y  180 

_  kN (N  +  \)(N — k  — 

~  180 


Finally,  define  SN  0  =  0  and  fiN  0  =  0 ,  let  t  —  k  I  [N  —  \)  for  k  e  {l, . . . ,  N  /  2} ,  and 
define  stochastic  process  BN  (t)  by 

1  2  Nil 


(3.76)  Bn(,)  =  ^^  1  ,  t  g 


,  VAT. 


liV-l’7V-l,"‘,(7V-l) 

Then  for  all  5  and  t  such  that  s<t  and  s,t  e  {0,1/(7V  —  l),...,  TV  /  {IN  —  2)}  and  for  all 
even  N  we  have 

(3-77)  ^(0)  =  0, 


(3.78) 


£^(01  =  0, 


and 
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Cov[5A,(5),JgA,(^)]  =  Cov 


^N,s(N- 1)  ^ N,s(N-l )  ^N,t(N-l)  ^N,t(N- 1) 


(l/4)Cov 
(l  /  4 )  Cov 


c  c 


,(N~  1) 

*^V„s(iV-l)  ’  ‘^AT.^V-l)  +  ^JVJ 

y=i(jV-l)+l 


Var 


Var 


'i(JV-l) 

/(7V-1)  ]' 

c* 

N,s(N- 1) 

+  Cov 

E  4., 

’  X]  T»J 

1=1 

j=s{N- 1)+1 

s(iV-l) 

t(N-l) 

\ 

C 

N,s(N-l) 

+  E 

E 

Covfr^.r^] 

i= 1 

i=s(v-i)+i 

/ 

s(N-l)  t(N-l) 

e^-i j+Z)  E  Covfc.i>r/ 


TV, 2 


=  ®(M  +  I 


(=1  y=s(iv-i)+i 

j(TV-l)(f-j)(TV-l) 


TV(iV  +  l) 


180 


(3.79) 


=  s(l  —  5)  —  s(t  —  s) 


This  structure  of  {bn  (t),  t  G  {0,l/(iV  —  l) , . . . ,  N I  (2N  —  2)}|  for  our  choice  of  cN  and 
equal  covariance  assumption  suggests  a  connection  to  the  Brownian  bridge. 

Shorack  and  Wellner  (1986)  present  several  useful  results  pertaining  to  the 
Brownian  bridge,  including  the  following: 

'/{*(<)>«  <  t  <  l}  is  a  Brownian  bridge,  then  for  all  a,  b  >  0 

P(B(t)  <  a(l  —  t)  +  bt  for  0  <  5  <  t  <  u  <  1  \b(s)  =  x  and  B{u)=y  ) 

(3.80) 

=  1  —  exp  I—  2[a(l  —  5)  +  bs  —  x][a(l  —  u)  +  bu  —  v]  l(u  —  5)} . 

Setting  a  =  b  =  A> 0  and  7  =  x  =  0  , 

P[B{t )  <AforO<t<w<l  |  B(u)=y  j 

=  1  —  exp  j— 2A(A  —  v)/  u  j  ,  where  B {u)  ~  TV(0,w(l  — i<)). 


(3.81) 


66 


Taking  the  expected  value  over  B[u)  yields  the  identity 
P[B{t )  <  A  for  0<  t  <  u  <  1,A  >  0) 


(3.82) 


i  A 

—  f 


^2nu  (l  —  w) 


tJu(\—u 


1  —  exp  * 


2A(A-y): 


l 


1  — exp 


2A  (a  —  XtJu(1  —  u^ 


=  T> 


A 

1 

,Vw(1-w)y 

\[2/k 

A 

~  exp  { 

yju(\  —  u'j 

exp 


exp 

y1 

2 u  (l  —  w) 

I  x2} 

-w)) 

- 

1  2  I 

2\^J  u{\  —  ifj 

dy 


dx 


exp 


{— 2A  ^dx 


A  (2m  —  1) 


yju  (l  —  w) 


where  <f>  is  the  standard  nonnal  cumulative  distribution  function.  For  u  =  1/2,  equation 

(3.82)  reduces  to 

(3.83)  P(B(t)<\ for  0  <  t  <  1/2,A  >  0)  =  $(2A)-^-exp{-2A2}. 

Now  define  K  =  sup/6j0 1/2-  B  (t)  for  Brownian  bridge  jf?  (t) ,  0  <  t  <  l} .  It  follows 
directly  from  (3.83)  that  the  critical  value  Ka  corresponding  to  P(K  >  Kr )  =  a  is  a 
solution  to 

(3.84)  1  —  a  —  <f> (2x)  +  i exp {— 2x2 }  =  0  , 

which  can  be  well-approximated  using  standard  computing  software.  In  other  words,  the 
null  distribution  of  K  is  known  and  so  K  is  an  obvious  choice  for  a  test  statistic  if  a 
process  of  interest  is  a  Brownian  bridge.  The  question  at  this  point  then  is,  “Does  the 

process  j  BN  (t) ,  t  G  |0, 1  /  (TV  —  l) , . . . ,  TV  /  ( 2N  —  2)}|  asymptotically  approach  a  Brownian 

bridge?”  Under  our  covariance  assumption,  this  process  has  the  same  mean  and 
covariance  structure  as  a  Brownian  bridge  for  all  N.  Furthermore  Bs  (/  )  is  by 
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construction  a  shifted,  scaled  sum  of  random  variables  whose  marginal  distributions  are 
asymptotically  normal.  Nevertheless,  we  are  somewhat  surprised  to  observe  that  BN  (/) 
is  not  normal  even  for  fairly  large  N.  Figures  14-18  show  normal  quantile-quantile  plots 
of  Bn  (f )  for  10,000  simulations  associated  with  the  null  hypothesis,  using  a  standard 

bivariate  normal  distribution,  A  =  300,  and  t  =  k/(N  —  l)  for 
k  =  1,  10,  30,  100,  and  150.  Figure  14  shows  k  =  1,  which  of  course  is  simply 
(TN-~pN)/cN.  We  have  proven  already  that  TN  is  asymptotically  normal,  and  indeed 
Figure  14  is  confirming  evidence. 


QQ  Plot  of  Sample  Data  versus  Standard  Normal 


Figure  14.  Quantile-Quantile  plot  of  BN  [k  /  [N  —  l))  for  10,000  simulations  of 

N  observations  from  a  standard  bivariate  normal  distribution; 

N  =  300,  k  =  1. 

As  k  varies  from  1  to  N  /  2  ,  BN  (t)  exhibits  markedly  non-normal  behavior  as 
demonstrated  in  panels  Figures  15-18.  In  Figure  15,  for  k  — 10  (corresponding  to 
t  ?a  0.033  )  Bn  (t)  shows  signs  of  slight  positive  skewness.  This  skewness  is  much  more 
apparent  as  k  approaches  the  middle  of  the  interval  as  seen  in  Figures  16-18  for  k  —  30 
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(7«0.10),  £  =  100  (7p^0.33),  and  £  =  150  (7^0.50).  In  other  words,  BN{t) 

constitutes  an  unusual  “natural”  example  of  a  case  where  the  distribution  of  the  sum  of  a 
large  number  of  asymptotically  normal  random  variables  does  not  appear  to  approach  a 
nonnal  distribution. 


QQ  Plot  of  Sample  Data  versus  Standard  Normal 


Figure  15.  Quantile-Quantile  plot  of  BN  (k  /  (iV  — l))  for  10,000  simulations  of 

N  observations  from  a  standard  bivariate  normal  distribution; 

77  =  300,  £  =  10. 
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QQ  Plot  of  Sample  Data  versus  Standard  Normal 


Figure  16.  Quantile-Quantile  plot  of  BN  [k  /  (TV  —  l)j  for  10,000  simulations  of 

TV  observations  from  a  standard  bivariate  normal  distribution; 

TV  =  300,  £  =  30. 


QQ  Plot  of  Sample  Data  versus  Standard  Normal 


Figure  17.  Quantile-Quantile  plot  of  BN  [k  /  (TV  — l))  for  10,000  simulations  of 

N  observations  from  a  standard  bivariate  normal  distribution; 

TV  =  300,  £  =  100. 
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QQ  Plot  of  Sample  Data  versus  Standard  Normal 


Figure  18.  Quantile-Quantile  plot  of  BN  [k  /  (iV  — l))  for  10,000  simulations  of 

N  observations  from  a  standard  bivariate  normal  distribution; 

TV  =  300,  *  =  150. 

Of  course,  this  evidence  alone  does  not  prove  that  BN  (/)  fails  to  approach  a 

Brownian  bridge  in  the  limit;  however,  it  does  establish  that  even  for  fairly  large  N  a 
Brownian  bridge  approximation  may  not  be  a  good  one.  Indeed,  simulation  shows  that 
for  fairly  large  N  (from  100  to  300)  a  Brownian  bridge  approximation  for  BN  (/)  gives 

reasonable  tail  probability  estimates  for  a  near  .10,  but  less  so  for  smaller  values  of  a . 
Table  3  shows  tail  probability  results  based  on  10,000  simulations  of  N  observations 
from  a  standard  bivariate  normal  distribution  for  a  —  0.10,  0.05,  0.025,  and  0.01  where 
the  achieved  test  level  for  each  combination  of  N  and  a  is  the  fraction  of  simulations  for 

which  Kn  =  maxt6{0jl>  N/2}  BN  (k/(N- 1))  >  Ka ,  where  Ka  is  a  root  of  (3.84). 
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a 

Achieved  test  level 

o 

o 

A 

N  =  200 

N  —  300 

0.10 

0.9757 

0.0877 

0.0980 

0.0984 

0.05 

1.1334 

0.0586 

0.0649 

0.0635 

0.025 

1.2731 

0.0385 

0.0432 

0.0419 

0.01 

1.4382 

0.0248 

0.0278 

0.0266 

Table  3.  Achieved  test  levels  for  maxt6{0 ,  A,,,j  BN  [k  /  [N  —  l))  using  Brownian 

bridge  critical  values  for  10,000  simulations  of  N  observations  from  a 
standard  bivariate  normal  distribution.  Achieved  test  level  for  each 
combination  of  N  and  a  is  the  fraction  of  simulations  for  which 

maX*e{0,l,...,V/2}  BN  {k/{N-  1))  >  Ka  . 

Because  the  null  distribution  for  K  N  is  difficult  to  obtain  exactly,  we  obtain 

useful  tail  probability  approximations  by  simulation.  These  tail  probability 
approximations  depend  on  both  sample  size  N  and  dimensionality  d .  Approximate 
critical  values  for  KN  based  on  simulation  are  provided  in  Appendix  B  for  various  values 

of  N  ,  d ,  and  a .  As  indicated  in  the  SPM  test  discussion,  the  ESPM  test  proves  to  be 
significantly  more  powerful  than  a  single  SPM  test.  We  show  performance  results  in  the 
next  chapter. 

We  close  this  discussion  of  an  SPM-based  ensemble  statistic  with  the  comment 
that  it  is  less  clear  exactly  how  to  formulate  an  analogous  ensemble  extension  for  the 
NAP  statistic.  In  that  case,  it  seems  natural  to  find  the  sequence  of  orthogonal  best 
matchings  as  in  the  ESPM  case  and  then  compute  M  Y  for  each  matching.  Letting  M  v  . 

denote  the  M  v  statistic  associated  with  the  7th  best  orthogonal  matching,  one  would 
expect  to  be  able  to  extract  more  change-point  information  out  of  the  collection  of  vectors 
{M^  j  ,  M N  2 , . . . ,  Mj,  Ar_1 1  than  out  of  the  NAP  test  using  Mw ,  alone.  For  now,  we  leave 
study  of  an  ensemble  NAP  statistic  for  future  work. 
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D. 


THE  BIPARTITE  ACCUMULATED  PAIRS  TEST 


The  tests  introduced  thus  far  are  based  on  non-bipartite  weighted  matchings,  and 
they  test  for  a  change  point  over  an  entire  set  of  observations  for  which  there  is  no  control 
set.  That  is,  no  prior  infonnation  exists  regarding  the  in-control  distribution.  In  many 
cases,  however,  some  history  of  observations  which  are  known  (or  assumed)  to  be  in¬ 
control  is  available,  and  the  problem  is  to  determine  whether  a  change  point  exists  in  a  set 
of  future  observations.  Therefore,  we  propose  the  Bipartite  Accumulated  Pairs  (BAP) 
test  for  cases  where  in-control  observation  history  is  available.  As  the  name  indicates, 
the  BAP  test  is  constructed  using  bipartite  matchings  (as  opposed  to  non-bipartite 
matchings  as  in  all  our  previous  tests).  Recall  that  a  bipartite  matching  pairs  observations 
from  one  pre-designated  group  with  observations  from  another,  and  is  a  solution  to  the 
integer  program  (2.13). 

1.  The  Z  Statistic 

Assume  we  have  some  history  {Xv...,Xm}  of  m  control  observations,  and  we 

desire  to  test  whether  a  change  point  exists  in  a  sequence  ( Xm+1 Xm+n  )  of  n  new  test 

observations.  One  approach  to  this  problem  is  to  estimate  the  in-control  distribution 
based  on  the  observation  history  and  then  test  whether  it  is  likely  that  the  new 
observations  are  drawn  from  the  estimated  distribution.  An  alternative  matching-based 
approach  is  to  compute  an  optimal  bipartite  matching  between  the  control  and  test 
observations  and  use  the  information  in  the  matching  to  test  whether  a  change  point  exists 
in  the  test  data. 

We  employ  the  following  approach  for  the  case  m  <  n  (more  test  data  than 
control  data;  we  discuss  the  m  >  n  case  later):  Compute  a  minimum-weight  bipartite 
matching  which  pairs  each  control  observation  with  some  test  observation,  based  on 
some  predetennined  cost  function.  We  emphasize  here  that  some  test  observations  will 
necessarily  remain  unpaired,  unlike  the  non-bipartite  matching  case  where  at  most  one 
observation  is  left  unpaired.  Define  random  variables  Zx,...,Zn_x  where  Zk  is  the 
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number  of  test  observations  among  {Xm+l,...,Xm+k}  which  are  paired  with  control 

observations.  Notice  that  this  test  has  very  much  the  same  flavor  as  the  NAP  test,  as  it 
counts  the  accumulation  of  pairs  in  the  optimal  matching  with  respect  to  the  order  of  the 
test  data.  By  construction  the  Zk  are  dependent  random  variables  each  of  which  is 

marginally  distributed  as  hypergeometric.  In  fact,  Zk  =  Zk  ]  +  8k ,  where  Sk  =1  if  test 
observation  Zm+k  is  paired  with  a  control  observation  and  8k  =  0  otherwise.  So, 


This  test  is  designed  to  test  null  hypothesis  H0:F  =  G  against  alternative 
Hg-F*G  where  X1,...,Xm  ~  F ,  Xm+l,...,Xm+n  rsj  G  ,  and  F  and  G  are  unknown.  If  a 

change  point  is  present  in  the  sequence  of  test  observations,  one  expects  that  more  pairs 
will  be  formed  between  the  first  k  test  observations  and  the  control  observations  than  if 
no  change  point  is  present,  so  the  existence  of  a  change  point  should  be  seen  in  pairings 

,  st 

being  “front-loaded”  in  the  Zk  sequence.  We  call  the  vector  Z  =  [Zl,...,Zn_1)  the 
Bipartite  Accumulated  Pairs  (BAP)  test  statistic. 


2.  Critical  Envelope 

We  develop  an  exact  simultaneous  test  for  Z  in  a  manner  closely  following  that 
for  M  v  of  the  Non-Bipartite  Accumulated  Pairs  (NAP)  test.  Specifically,  we  seek  a 

vector  of  non-negative  integers  qa  =  \qv...,qn_^  so  that  the  following  is  true  for  a  given 
test  level  a : 

(3.86)  P{Zk  ^qk,  1  <  k  <  h-1)  >  l- a  . 

This  allows  us  to  construct  a  simultaneous  level  a  test.  As  for  the  NAP  case,  we  take  qk 
to  be  the  1  -a  quantile  of  the  distribution  of  Zk  (hypergeometric)  so  that  the  non- 
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simultaneous  test  at  stage  k  has  level  a;  that  is,  P(Zk>qk)<a  for  each 
k  e  {l,. . . ,  n  - 1} .  Then  we  use  a  recursive  computational  scheme  to  select  a  that  gives  a 

simultaneous  test  level  as  close  to  a  as  possible  without  exceeding  this  value.  To 
develop  the  recursion,  first  note  that  the  a  -quantiles  satisfy  the  properties 
qk  =  min{r :  P(Zk  <  r )  >  a}  and  q[  <q2  <■■■  <  qnX .  Now  using  the  same  conditioning 
approach  as  in  (3.52)-(3.58),  we  have 

it 

(3.87)  P(Zj<qJ,\<j  <k)  =  YJ7r(r,k)-P(Zk=r) 

r= 0 

where  7z(r,k}  =  P(Zx  <qx,...,Zk_x  <qk_x \Zk  =r)  and  P(Zk=r )  is  given  by  (3.85). 
Observe  that  n(r, k)  observes  a  simple  recursive  relationship  expressed  by  the  following 
lemma: 

Lemma  3-6: 

..  ...  n{r,k)  =  j7r{r-\,k-\)l{r-\<qk_x)  +  t-^n(r,k-\)l{r<qk_l), 

(J.oo)  /C 

k  =  2,..., max [/ :  qx  <  min (/, m)}  ;  r  =  0, . . . , qk  . 

Proof  of  Lemma  3-6: 

n(r,k)  =  ^2p([Zj<qj,l<j<k-2\Zk_x  =  s,Zk  =  r)- P(Zk_x  <s\Zk  =  r) 

=  HP{Zi  <qj’l<j<k~2\zk-x  =s)-P(Zk_x<s\Zk  =r) 

(3.89)  =^K(s,k-\)-P(Zk_x  <s\Zk  =r)-l(s<qk_x) 

=  n{r-\,k-\)-P(Zk_x  =  r-\\Zk  =  r)-l{r-\<qk_x) 
+n(r,k-\)-P(Zk_x=r\Zk=r)-l(r<qk_l) 

Under  the  null  hypothesis,  Zk  =  r  implies  that  each  of  the  k  observations  Xm+x Xm+k 
is  equally  likely  to  be  among  r  pairs  in  the  bipartite  matching;  therefore 

p  (zk-i  =  r  ~  !|  ■ zk  =  r)  =  and  P(Zk_x  =  r\  Zk  =  r)  =  .  u 
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The  recursion  starts  at  k  =  2  and  continues  to  k  =  max  [/ :  qt  <  min  (/,/«)} .  For 
k  =  2  we  have 

0  <r<q2; 

0,  r  >  q2 . 

This  recursion  scheme  is  quite  readily  implemented  in  S-PLUS®,  R,  MATLAB®,  or 
other  interpreted  languages;  an  implementation  for  R  is  included  in  Appendix  C. 

The  simultaneous  test  presented  here  is  clearly  an  improvement  over  the 
Bonferroni  method.  For  example,  with  m  =  25  control  points  and  n  =  50  test  points  a 
nominal  a  =  0.05  simultaneous  test  achieves  test  level  .047  using  uses  a  =  .011.  By 
contrast  the  Bonferroni  method  achieves  test  level  .006  using  a  =  .05/  49  =  .001 .  The 
improvement  is  even  more  dramatic  in  larger  samples.  For  m  =  500,  n  =  1000,  the  BAP 
test  achieves  level  .049  using  a  =  .00203  compared  to  an  achieved  level  of  .0018  using 
a  =  .05  /  999  =  .00005  by  Bonferroni. 

3.  A  Graphical  Example 

Figure  19  shows  20  observations  all  drawn  from  a  standard  bivariate  normal 
distribution.  The  plot  marker  for  each  point  is  its  sequence  label.  The  first  m  =  4  points 
are  the  control  set  and  are  circled  for  emphasis;  the  last  n  =  16  points  are  the  test  set. 
Since  all  the  test  data  are  from  the  same  distribution  as  the  control  data,  there  is  no 
change  point.  We  use  a  MATLAB®  implementation  of  the  Jonker-Volgenant 
assignment  algorithm  (1987)  provided  by  Levedahl  (2000)  to  compute  an  optimal 
bipartite  match  with  respect  to  Euclidean  distance  for  these  data;  the  resulting  pairs  are 

shown  connected  by  line  segments.  Table  4  shows  the  critical  envelope  qa  =(q1,...,ql 5) 

,  j 

and  the  test  statistic  Z  =  [Zl,...,Z15)  for  the  data  in  Figure  19.  Recall  that  k  —  l 

corresponds  to  Xm+1  =  X5,  k  =  2  corresponds  to  Xm+2  =  X6 ,  and  so  on,  and  Zk  is  equal 

to  the  number  of  pairings  of  between  the  sets  [Xx , . . . ,  X4  }  and  |X5 , . . . ,  X5+k_}  j .  For  this 

small  data  set  we  can  determine  the  Zk  values  by  inspection  from  Figure  19.  Since 
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(3.90) 


n 


(r,2)  = 


X5,  X7,  Xw,  andX15  are  the  test  set  elements  paired  with  elements  in  the  control  set, 

Zj  =  Z2  =  1 ,  Z3  =  Z4  =  Z5  =  2  ,  Z6  =  ■  ■  •  =  Z10  =  3 ,  and  Zn  =  ■  ■  •  =  Z15  =  4 .  None  of 

these  values  exceeds  the  critical  envelope,  so  we  do  not  reject  the  null  hypothesis  that  all 
the  test  data  share  the  same  distribution  as  the  control  data. 


Figure  19.  Minimum  weight  bipartite  matching  on  20  points;  m  =  4  and 
n  —  16  with  no  change  point.  The  control  set  is  circled;  line  segments 
connect  observations  that  are  paired  in  the  optimal  bipartite 
matching. 
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Table  4.  Critical  envelope  qa  and  BAP  test  statistic  Z  for  Figure  19  data.  Zk 
never  exceeds  qk ,  so  the  null  hypothesis  of  no  change  is  not  rejected. 
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In  contrast,  Figure  20  shows  the  same  control  data,  but  the  test  data  are  different 
in  that  a  change  point  exists  at  r  =  13,  where  the  index  r  is  in  reference  to  the  pooled 
data  set  {Xt , . . . , X20 } .  Specifically,  observations  {X5 , . . . , Xu  }  are  the  same  as  in  the  no¬ 
change  case,  but  {X13,...,X20}  are  translated  by  2  units  in  both  dimensions  from  the  no¬ 
change  case.  The  control  data  are  circled  as  before,  and  the  data  affected  by  the  change 
point  are  circumscribed  by  a  box  to  clearly  show  the  mean  shift. 


*i 


Figure  20.  Minimum  weight  bipartite  matching  on  20  points;  m  =  4  and 
n  — 16  with  a  change  point  r  —  13.  The  control  set  is  circled;  post¬ 
change  point  observations  are  boxed.  Line  segments  connect 
observations  that  are  paired  in  the  optimal  bipartite  matching. 
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Table  5.  Critical  envelope  qa  and  BAP  test  statistic  Z  for  Figure  20  data. 

Z6  >  q6 ,  so  the  null  hypothesis  of  no  change  is  rejected. 

Table  5  shows  qa  and  test  statistic  Z  for  Figure  20.  Since  Z6  >  q6,  we  reject  the  null 
hypothesis  in  favor  of  the  alternative  that  a  change  point  exists  in  the  test  data. 

An  alternative  to  the  BAP  test,  as  we  have  formulated  it  here,  is  required  for  the 
case  in  >  n  ,  where  there  exist  at  least  as  many  control  points  as  test  points,  since  the  BAP 
test  would  pair  every  test  point  to  a  control  point  in  such  cases.  A  matching-based 
possibility  appeals  to  previous  results:  First,  assign  to  each  point  in  the  control  set  some 
measure  of  centralness  or  depth  relative  to  that  set;  call  this  measure  the  observation 
“quality.”  Then  compute  an  optimal  bipartite  matching  and  let  Q  denote  the  quality  of 

the  control  observation  that  is  paired  in  the  matching  with  test  observation  Xm+. .  Since 
m  >  n  ,  every  observation  in  the  test  set  is  paired  with  some  observation  in  the  control  set, 
resulting  in  the  sequence  ■  Now  assign  ranks  to  this  sequence  and  perform  a 

change-point  test  on  it  based  on  ranks.  The  existence  of  a  change-point  in  the  rank 
sequence  corresponds  to  the  existence  of  a  change-point  in  the  test  set. 

The  BAP  test  invites  consideration  of  an  ensemble  extension  similar  to  the 
ensemble  extension  of  the  NAP  test.  That  is,  compute  the  BAP  statistic  for  each  optimal 
matching  in  an  orthogonal  sequence  of  optimal  matchings,  and  evaluate  the  collection 

{Z1,Z2,...,ZN_1J  for  additional  change-point  infonnation.  We  do  not  explore  this 

concept  here;  rather,  the  proposed  BAP  test  and  associated  discussion  only  begin  to 
examine  what  appear  to  be  rich  opportunities  to  apply  bipartite  matching  ideas  to  the 
change-point  problem,  and  we  mention  them  here  to  inspire  further  research.  In  the  next 
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chapter,  we  will  examine  the  performance  of  our  proposed  non-bipartite  matching 
methods  only,  and  leave  an  in-depth  study  of  bipartite  methods  for  future  work. 
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IV.  SIMULATION  STUDY 


A.  METHODOLOGY 

The  usefulness  of  the  tests  presented  in  this  paper  lies  in  their  power  to  identify 
change  points  under  a  wide  range  of  alternative  hypotheses.  In  this  chapter,  we  present 
the  results  of  computer  simulations  that  compare  the  power  of  the  Sum  of  Pair  Maxima 
(SPM),  Non-Bipartite  Accumulated  Pairs  (NAP),  and  Ensemble  Sum  of  Pair  Maxima 
(ESPM)  tests  for  a  variety  of  scenarios.  We  compare  these  tests  to  the  James,  James,  and 
Siegmund  (JJS)  (1992)  test  discussed  in  Section  II. B. 2.  In  every  case  the  sample  space  is 
R"'  (where  d  is  observation  dimension),  sample  size  is  N  —  200 ,  and  test  significance 
level  is  a  —  0.05  .  The  choice  N  =  200  is  based  on  a  desire  to  investigate  test  behavior 
for  a  moderately  large  sample  size  while  avoiding  excessive  computation  times  for  large 
simulations.  Detection  power  is  the  performance  metric,  where  power  is  defined  as  the 
probability  of  rejecting  the  null  hypothesis  when  it  is  false.  Each  power  estimate  in  the 
tables  of  this  chapter  is  the  fraction  of  times  that  a  particular  test  indicates  that  a  change 
point  has  occurred  under  the  given  conditions,  based  on  1000  simulations.  We  use  the 
Mahalanobis  distance  function  given  by  (2.16)  (estimating  V  by  the  sample  covariance 
matrix)  as  a  natural  choice  to  determine  interpoint  costs  in  R"'  unless  otherwise  specified. 
Section  C.4  of  this  chapter  shows  performance  results  for  cases  where  different  cost 
functions  are  considered.  In  all  other  cases,  the  scenarios  vary  according  to  the  following 
factors: 

1)  Underlying  distribution  family.  We  consider  the  following  probability 
distribution  families: 

a)  multivariate  nonnal,  denoted  FMVN , 

b)  a  multivariate  normal  mixture,  denoted  Fmix ,  as  a  heavy-tailed  case, 
and 

c)  a  multivariate  Weibull  distribution,  denoted  FWeib ,  as  a  skewed  case. 
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2)  Dimension.  We  evaluate  two  dimensions:  d  —  5  and  d  —  20  . 

3)  Change-point  location.  We  examine  cases  where  the  change  point  occurs  in 
the  middle  and  toward  the  end  of  the  observation  sequence. 

4)  Change  parameter.  We  consider  changes  in  distribution  mean  and  scale. 

5)  Type  of  change.  At  a  change  point,  we  consider  cases  where  the  parameter 
undergoing  change  does  so  in  an  abrupt  Oump)  or  gradual  (drift)  manner. 

6)  Magnitude  of  change.  We  examine  changes  of  various  magnitudes. 


Specifically,  FMVN  is  the  cumulative  distribution  associated  with  density  function 


(4-1)  /MVN(x;p,£): 


1 


1 


(2tt) 


N/2  i^,l/2 


exp  --(x-p)  E  '(x-|i)  Vxe: 


where  peM"'  and  E  e  M'/  f/  are  the  distribution  mean  and  covariance  matrix, 
respectively.  The  in-control  data  for  the  multivariate  normal  case  have  p  =  0  and 
E  =  Id ,  where  I d  is  the  d  x  d  identity  matrix.  The  post-change-point  data  have  a 
different  mean  or  scale  as  specified  in  the  next  section. 


Fmix  is  constructed  as  follows:  Let  U  ~  FMVN ,  let  Z  be  a  Bernoulli  random 
variable  with  success  probability  pmx ,  and  let  crmix  be  some  scalar  larger  than  1 .  Then 
the  random  variable 

(4.2)  *  =  [1  +  K*-1  )Z]U~F^, 


and  pmiK  and  crmix  are  the  proportion  and  scale  of  the  mixing,  respectively.  For  this 
study,  we  set  Pmix  =  0. 1  and  (Jmix  =  4  . 

FWeibis  defined  to  be  the  distribution  on  M"  associated  with  d  independent 
identically  distributed  univariate  Weibull  random  variables;  that  is, 

x  =  (x1,...,xrf)etj , 

0  otherwise , 


(4.3) 


(*;>?.  P)= 


n 


1  — exp 


Pi 


\v 
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where  r)  and  /?.  are  the  univariate  Weibull  shape  and  scale  parameters,  respectively,  and 
denotes  the  closed  upper  half-space  of  W1 .  For  this  study  we  set  //  =  1.5.  For  the 
in-control  data,  P  =  1 ;  P  varies  for  the  post-change-point  data  as  specified  in  the  next 
section. 

We  consider  a  change  point  at  r  =  101  (for  a  50-50  percent  split  of  pre-  and  post¬ 
change  data)  and  r  =  161  (for  an  80-20  percent  split  of  pre-  and  post-change  data).  We 
examine  change  magnitudes,  denoted  by  A ,  ranging  from  0  to  1.0  by  increments  of  0.25. 
For  the  case  A  =  0  the  null  hypothesis  is  true  and  the  tabulated  power  at  A  =  0  is  an 
estimate  of  the  test’s  false  alarm  rate,  which  is  the  likelihood  that  a  test  rejects  the  null 
hypothesis  when  it  is  in  fact  true.  False  alarm  rate  ideally  is  0.05  at  significance  level 
a  —  0.05  .  For  the  case  A  >  0,  A  indicates  the  total  magnitude  of  the  change  from  the 
first  to  last  observation.  So,  for  a  jump  change  A  is  magnitude  of  the  abrupt  change  that 
occurs  at  change  point  r .  For  a  drift  change,  the  parameter  undergoing  change  varies 
linearly  from  its  reference  level  so  that  the  total  change  magnitude  between  the  first  and 
last  observation  is  A .  For  changes  in  distribution  mean,  we  simulate  the  change  in  the 
first  component  of  each  observation  only.  For  changes  in  distribution  scale,  we  simulate 
the  change  in  all  components.  Figures  21-28  shows  cases  of  mean  and  scale  changes  of 
jump  and  drift  variety  for  change  points  at  r  =  101  or  r  =  161  associated  with  the 
multivariate  normal  case  for  illustration  purposes.  The  x-axis  is  sequence  label  i;  the  y- 
axis  is  the  value  of  the  first  component  of  the  itb  observation. 
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Example  of  jump  change  in  mean:  A  =1;  r=101 


X' 


Observation 
Mean  value 
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Observation  sequence  label,  i 


160  ISO  200 


Figure  21.  Typical  change  scenario:  mean  jump  at  r  =  101 .  The  mean  jumps 
in  the  first  dimension  only  and  remains  fixed  in  all  other  dimensions. 


Example  of  drift  change  in  mean:  A  =1;  r=1 01 
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Figure  22.  Typical  change  scenario:  mean  drift  at  r  =  101 .  The  mean  drifts 
in  the  first  dimension  only  and  remains  fixed  in  all  other  dimensions. 
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Figure  23. 
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Figure  24. 


Example  of  jump  change  in  mean:  A  =1;  r=161 
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Typical  change  scenario:  mean  jump  at  r  =  161 .  The  mean  jumps 
in  the  first  dimension  only  and  remains  fixed  in  all  other  dimensions. 


Example  of  drift  change  in  mean:  A  =1:  r  =1 61 
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Typical  change  scenario:  mean  drift  at  r  =  161.  The  mean  drifts 
in  the  first  dimension  only  and  remains  fixed  in  all  other  dimensions. 
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Example  of  jump  change  in  scale:  A  =1;  r=1 01 
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♦  *  r  *  *  ♦♦♦  * — 

*  *♦  *  *  V  -*-*-!  *  *  ♦  *  ♦  ♦ 

♦  ♦  ♦  «*  *«**♦  ♦* 

-  .  ..  .  *  *  •/.  * 

- 

I  4  .  ’  .  .  ’  ’ 

_ 1 _ 1 _ 

_ 1 _ 1 _ 1 _ 1 _ 1 _ 1 _ 1 _ 

X~ 


20  40 


60  80  100  120  140 

Observation  sequence  label,  / 


160  180  200 


Figure  25.  Typical  change  scenario:  scale  jump  at  r  =  101 .  The  scale  jumps 
in  all  dimensions. 


Example  of  drift  change  in  scale:  A  =1;  r=101 

•  Observation 

- +/- 1  sd 


0  20  40  00  SO  100  120  140  160  ISO  200 

Observation  sequence  label,  / 

Figure  26.  Typical  change  scenario:  scale  drift  at  r  =  101 .  The  scale  jumps  in 
all  dimensions. 
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Example  of  jump  change  in  scale:  A  =1 :  r=1 61 


“i - r~ 


Observation 
+/-1  sd 


X~ 


- - - - - - * - -* - - - - ^ 


20  40 


60  80  100  120  140 

Observation  sequence  label,  / 


_i _ i_ 

160  180  200 


Figure  27.  Typical  change  scenario:  scale  jump  at  r  =  161 .  The  scale  jumps 
in  all  dimensions. 


Example  of  drift  change  in  scale:  A  =1;  r=161 


XT 


Observation 
+/-1  sd 


“i - r~ 


20  40 


60  80  100  120  140 

Observation  sequence  label,  / 


Figure  28.  Typical  change  scenario:  scale  drift  at  r  =  161 .  The  scale  jumps  in 
all  dimensions. 
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B. 


PERFORMANCE  RESULTS 


Tables  6-13  show  power  estimates  for  the  Sum  of  Pair-Maxima  (SPM),  Non- 
Bipartite  Accumulated  Pairs  (NAP),  Ensemble  Sum  of  Pair-Maxima  (ESPM)  and  James, 
James,  and  Siegmund  (JJS)  tests  for  a  variety  of  scenarios  at  test  level  a  —  0.05 . 
Attained  significance  levels  for  the  SPM  and  NAP  tests  with  N  —  200  are 
aSPM  =  0.04962  and  aNAP  =  0.04957  .  ESPM  critical  values  are  obtained  via  simulation 
(see  Appendix  B).  JJS  critical  values  are  analytically  based  on  the  assumption  that  the 
underlying  distribution  is  multivariate  normal  with  a  common  but  unknown  covariance 
matrix.  While  the  JJS  test  is  analytically  designed  to  detect  abrupt  mean  changes  only, 
we  observe  in  our  simulations  that  it  is  also  sensitive  to  gradual  mean  changes  and  scale 
changes  while  maintaining  a  false  alann  rate  consistent  with  test  significance  level. 
Therefore,  we  consider  the  JJS  test  for  comparison  in  those  cases  as  well. 


Each  table  specifies  the  distribution,  dimensionality,  change  parameter  (mean  or 
scale),  and  change  type  (jump  or  drift)  along  the  top.  The  left-most  column  shows  the 
total  magnitude  of  the  change  for  the  varying  parameter  in  that  case.  The  change 
magnitude  at  i  is  A.  =  A  for  jump  changes  and  A.  =  (i  —  r  +  l)A/(A  —  t  +  1)  for  drift 


changes.  For  a  mean  change  at  change  point  r ,  observation  i  is  distributed  as  follows: 

.  MVN  '  1  for  the  MVN  case; 

X,-(A,.,0,0,...,0)~Fmvn,  i  >  t\ 

(4.4) 

x  ~  .  i  <  r  1 

for  the  mixture  case. 


X,-(A„0,0,...,0)~F^,  i>  t 


We  model  changes  in  the  mean  vector  by  changing  only  its  first  component  because  of 
the  rotational  invariance  of  FMVN  and  Fmix .  For  a  scale  change  at  change  point  r , 
observation  i  is  distributed  as  follows: 
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for  the  MVN  case; 


(4.5) 


Xt  ~  ^mvn  ’  Z  <  T 


X,. 


1  +  A,. 


F  i  >  r 

7  MVN  ’  *  —  ' 


X.~F  ,  /  <  r 

i  mix  ? 


X 


1  +  A,. 


Fmix. 


for  the  mixture  case. 


For  fixed  r] ,  a  change  in  p  results  in  a  mean  and  scale  change  for  the  Weibull 

distribution.  We  change  only  the  first  component  of  P  to  be  consistent  with  the  other 

two  distribution  family  cases,  so  observation  i  is  distributed  as  follows: 

X.  ~  FWeib  with  p  =  1,  i  <  t  1 

(4.6)  .  [for  the  MV  Weibull  case. 

X,  ~  +weib  with  P  =  (l  +  A. ,  1, . . . , l) ,  i>rj 

1.  Multivariate  Normal  Case 


o.  Changes  in  Location 

Tables  6-10  present  power  estimates  for  the  multivariate  nonnal  scenarios 
under  different  alternatives.  Tables  6-8  are  associated  with  a  mean  change.  One  would 
expect  the  JJS  test  to  be  superior  to  nonparametric  tests  for  the  mean  jump  cases,  as  JJS  is 
a  parametric  test  based  on  the  assumptions  of  multivariate  normality  and  a  single  abrupt 
change  in  distribution  mean.  Our  simulation  results  suggest  that  the  JJS  test  is  superior 
overall  in  both  the  jump  and  drift  cases.  However,  both  the  SPM  and  NAP  tests  show 
appreciable  power  in  each  case,  and  even  more  noteworthy  that  the  power  of  the  ESPM 
test  is  comparable  to  JJS  in  some  cases.  In  particular,  when  the  change  point  occurs  in 
the  middle  of  the  observation  sequence  (Tables  6  and  7),  the  JJS  test  and  ESPM  test 
perform  comparably.  When  the  change  point  is  away  from  the  middle  of  the  observation 
sequence  (Table  8)  all  the  tests  suffer  somewhat,  since  fewer  post-change  data  are  present 
to  indicate  a  change.  However,  the  nonparametric  tests  seem  to  suffer  more  power  loss 
than  the  JJS  test.  Furthermore,  the  power  of  all  four  tests  is  reduced  in  higher  dimension 
for  changes  of  fixed  magnitude  (compare  Table  6  with  d  —  5  to  Table  7  with  d  =  20), 
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since  the  magnitude  of  the  change  becomes  a  smaller  fraction  of  the  average  distance 
between  points  as  dimensionality  increases.  This  effect  appears  to  impact  the  ESPM  and 
JJS  tests  comparably. 


F  ■  d  =  5- 
mean  change 

Jump;  r  =  101 

A 

SPM 

NAP 

ESPM 

JJS 

0.00 

0.06 

0.05 

0.04 

0.05 

0.25 

0.07 

0.05 

0.18 

0.14 

0.50 

0.10 

0.09 

0.60 

0.52 

0.75 

0.23 

0.18 

0.93 

0.93 

1.00 

0.41 

0.33 

1.00 

1.00 

Drift;  r  =  101 

SPM 

NAP 

ESPM 

JJS 

0.04 

0.05 

0.06 

0.07 

0.06 

0.05 

0.10 

0.09 

0.07 

0.05 

0.27 

0.22 

0.11 

0.09 

0.55 

0.53 

0.20 

0.16 

0.84 

0.85 

SPM:  Sum  of  Pair-Maxima  NAP:  Non-Bipartite  Accumulated  Pairs 

ESPM:  Ensemble  Sum  of  Pair-Maxima  JJS:  James,  James,  and  Siegmund  test 

Table  6.  Test  power  to  detect  a  mean  change  of  magnitude  A  for  MVN  case 
with  dimension  d  =  5  and  change  point  r  =  101  based  on  N  —  200 , 
a  =  0.05 ,  and  1000  simulations.  Jump  case  is  to  the  left;  drift  case  is  to  the 

right. 


FMvH’d  =  20; 
mean  change 

Jump;  r  =  101 

A 

SPM 

NAP 

ESPM 

JJS 

0.00 

0.05 

0.05 

0.05 

0.03 

0.25 

0.07 

0.06 

0.11 

0.07 

0.50 

0.09 

0.07 

0.33 

0.20 

0.75 

0.11 

0.09 

0.71 

0.63 

1.00 

0.22 

0.16 

0.95 

0.95 

Drift;  r  =  101 

SPM 

NAP 

ESPM 

JJS 

0.05 

0.05 

0.05 

0.04 

0.05 

0.05 

0.07 

0.04 

0.07 

0.05 

0.13 

0.09 

0.08 

0.07 

0.31 

0.23 

0.11 

0.09 

0.56 

0.49 

SPM:  Sum  of  Pair-Maxima  NAP:  Non-Bipartite  Accumulated  Pairs 

ESPM:  Ensemble  Sum  of  Pair-Maxima  JJS:  James,  James,  and  Siegmund  test 

Table  7.  Test  power  to  detect  a  mean  change  of  magnitude  A  for  MVN  case 
with  dimension  d  =  20  and  change  point  r  =  101  based  on  N  =  200  , 
a  =  0.05 ,  and  1000  simulations.  Jump  case  is  to  the  left;  drift  case  is  to  the 

right. 
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F  ■  d  =  5- 
mean  change 

Jump;  r  =  161 

A 

SPM 

NAP 

ESPM 

JJS 

0.00 

0.05 

0.06 

0.06 

0.06 

0.25 

0.06 

0.05 

0.08 

0.08 

0.50 

0.07 

0.08 

0.22 

0.30 

0.75 

0.12 

0.10 

0.52 

0.72 

1.00 

0.19 

0.20 

0.81 

0.96 

Drift;  r  =  161 

SPM 

NAP 

ESPM 

JJS 

0.05 

0.06 

0.05 

0.05 

0.05 

0.05 

0.06 

0.07 

0.06 

0.06 

0.10 

0.12 

0.06 

0.08 

0.15 

0.27 

0.05 

0.08 

0.28 

0.51 

SPM:  Sum  of  Pair-Maxima  NAP:  Non-Bipartite  Accumulated  Pairs 

ESPM:  Ensemble  Sum  of  Pair-Maxima  JJS:  James,  James,  and  Siegmund  test 

Table  8.  Test  power  to  detect  a  mean  change  of  magnitude  A  for  MVN  case 
with  dimension  d  =  5  and  change  point  r  =  161  based  on  N  =  200 , 
a  =  0.05 ,  and  1000  simulations.  Jump  case  is  to  the  left;  drift  case  is  to  the 

right. 


b.  Changes  in  Scale 

Tables  9  and  10  show  power  estimates  for  multivariate  normal  scale 
changes.  Note  that  the  case  in  Table  9  (d  —  5,  r  =  101 )  varies  from  the  case  in  Table  10 
(d  =  20  ,  r  =  161)in  both  dimensionality  and  change  point.  We  observe  that  the  SPM 
and  NAP  tests  demonstrate  reasonable  power  to  detect  scale  changes,  while  the  ESPM 
test  shows  impressive  power  to  do  so.  Recall  that  the  JJS  test  is  not  specifically  designed 
to  detect  scale  changes;  however,  it  does.  Interestingly,  JJS  exhibits  its  worst  power 
among  these  scenarios  in  lower  dimension  with  a  50-50  split  and  mean  jump,  and  its  best 
power  in  higher  dimension  with  an  80-20  split  and  mean  drift. 
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F  d  =  5- 

scale  change 

Jump;  r  =  101 

A 

SPM 

NAP 

ESPM 

JJS 

0.00 

0.06 

0.05 

0.05 

0.04 

0.25 

0.16 

0.11 

0.32 

0.09 

0.50 

0.51 

0.42 

0.97 

0.15 

0.75 

0.88 

0.88 

1.00 

0.19 

1.00 

0.99 

0.99 

1.00 

0.24 

Drift;  r  =  101 

SPM 

NAP 

ESPM 

JJS 

0.05 

0.05 

0.05 

0.05 

0.08 

0.08 

0.15 

0.14 

0.27 

0.20 

0.52 

0.27 

0.52 

0.46 

0.92 

0.42 

0.79 

0.77 

1.00 

0.54 

SPM:  Sum  of  Pair-Maxima  NAP:  Non-Bipartite  Accumulated  Pairs 

ESPM:  Ensemble  Sum  of  Pair-Maxima  JJS:  James,  James,  and  Siegmund  test 

Table  9.  Test  power  to  detect  a  scale  change  of  magnitude  A  for  MVN  case 
with  dimension  d  =  5  and  change  point  r  =  101  based  on  N  =  200 , 
a  =  0.05 ,  and  1000  simulations.  Jump  case  is  to  the  left;  drift  case  is  to  the 

right. 


^MVN’  d  ~  20, 
scale  change 

Jump;  r  =  161 

A 

SPM 

NAP 

ESPM 

JJS 

0.00 

0.05 

0.06 

0.05 

0.04 

0.25 

0.08 

0.08 

0.24 

0.26 

0.50 

0.25 

0.28 

0.83 

0.66 

0.75 

0.56 

0.75 

1.00 

0.90 

1.00 

0.85 

0.99 

1.00 

0.98 

Drift;  r  =  161 

SPM 

NAP 

ESPM 

JJS 

0.05 

0.05 

0.06 

0.04 

0.06 

0.07 

0.10 

0.23 

0.09 

0.11 

0.25 

0.70 

0.17 

0.25 

0.55 

0.93 

0.28 

0.49 

0.86 

0.99 

SPM:  Sum  of  Pair-Maxima  NAP:  Non-Bipartite  Accumulated  Pairs 

ESPM:  Ensemble  Sum  of  Pair-Maxima  JJS:  James,  James,  and  Siegmund  test 

Table  10.  Test  power  to  detect  a  scale  change  of  magnitude  A  for  MVN  case 
with  dimension  d  =  5  and  change  point  r  =  161  based  on  N  =  200 , 
a  —  0.05 ,  and  1000  simulations.  Jump  case  is  to  the  left;  drift  case  is  to  the 

right. 
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2. 


Multivariate  Normal  Mixture  Case 


a.  Changes  in  Location 

Table  1 1  shows  power  results  for  the  case  of  an  underlying  multivariate 
nonnal  mixture  with  pmix  =0.10  and  <xmix  =4  as  an  example  of  a  heavy-tailed 
case: 


SPM:  Sum  of  Pair-Maxima  NAP:  Non-Bipartite  Accumulated  Pairs 

ESPM:  Ensemble  Sum  of  Pair-Maxima  JJS:  James,  James,  and  Siegmund  test 

Table  11.  Test  power  to  detect  a  mean  change  of  magnitude  A  for  MVN 
mixture  case  with  dimension  d  =  5  and  change  point  r  =  101  based  on 
N  =  200  ,  a  =  0.05 ,  and  1000  simulations.  Jump  case  is  to  the  left;  drift  case 
is  to  the  right.  Shading  indicates  excessive  false  alarm  rate. 


Drift;  r  =  101 

SPM 

NAP 

ESPM 

JJS 

0.04 

0.04 

0.06 

0.28 

0.07 

0.06 

0.09 

0.26 

0.07 

0.07 

0.21 

0.33 

0.10 

0.09 

0.47 

0.39 

0.15 

0.12 

0.76 

0.55 

F^\d  =  5\ 

mean  change 

Jump;  r  =  101 

A 

SPM 

NAP 

ESPM 

JJS 

0.00 

0.05 

0.05 

0.04 

0.27 

0.25 

0.07 

0.05 

0.14 

0.28 

0.50 

0.09 

0.08 

0.56 

0.38 

0.75 

0.20 

0.15 

0.88 

0.61 

1.00 

0.36 

0.25 

0.99 

0.85 

The  matching-based  tests  demonstrate  results  comparable  to  their 
respective  powers  in  the  similar  multivariate  nonnal  case  (compare  to  Table  6)  and  they 
retain  a  false  alarm  rate  consistent  with  test  significance  level.  However,  the  false  alarm 
rate  for  the  JJS  test  far  exceeds  5%  and  therefore  disqualifies  it  for  comparison  at  the  0.05 
test  level.  We  explore  this  phenomenon  in  a  separate  study  using  10,000  simulations 
( N  =  200 ,  50-50  split)  and  find  that  the  JJS  false  alarm  rates  exceed  test  level  even  for 
small  mixing  probabilities.  The  problem  gets  worse  as  dimensionality  increases,  as 
shown  in  Figure  29. 
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Effect  of  mixing  proportion  on  J JS  false  alarm  rate  for  nominal  a  =  0.05  and  crmjx  =  4 

I  i  - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 


. d  =  2 

- d  =  5 


Q  _ I _ I _ I _ I _ I _ I _ I _ I _ I _ 

0  0.01  0.02  0.03  0.04  0.05  0.06  0.07  0.08  0.09  0.1 

P 

'mix 

Figure  29.  Effect  of  mixing  proportion  on  JJS  false  alarm  rate  for  MVN 
mixture  cases  with  dimension  d  =  2,  5,  and  20  based  on  N  =  200  , 

a  =  0.05 ,  and  10,000  simulations.  Scale  of  mixing  amix  =  4 ; 
proportion  of  mixing  pmix  varies  along  the  jc-axis. 

b.  Changes  in  Scale 

As  in  the  multivariate  normal  case,  Table  12  shows  that  the  SPM  and  NAP 
tests  have  fair  power  to  detect  scale  changes  and  the  ESPM  test  has  noteworthy  power  to 
do  so.  Again,  excessive  false  alarm  rate  makes  JJS  an  unacceptable  test  for  these 
scenarios.  These  multivariate  normal  mixture  cases  of  changing  location  and  scale 
highlight  the  utility  of  nonparametric  change-point  approaches  such  as  the  SPM,  NAP, 
and  ESPM  tests  in  that  they  are  not  limited  by  strict  distributional  assumptions. 
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*>d  =  5; 
scale  change 

Jump;  r  =  101 

A 

SPM 

NAP 

ESPM 

JJS 

0.00 

0.05 

0.05 

0.05 

0.25 

0.25 

0.12 

0.09 

0.31 

0.29 

0.50 

0.38 

0.28 

0.92 

0.31 

0.75 

0.75 

0.69 

1.00 

0.32 

1.00 

0.93 

0.92 

1.00 

0.32 

Drift;  r  =  101 

SPM 

NAP 

ESPM 

JJS 

0.06 

0.06 

0.05 

0.29 

0.08 

0.08 

0.13 

0.32 

0.16 

0.13 

0.48 

0.39 

0.35 

0.27 

0.85 

0.45 

0.54 

0.45 

0.99 

0.50 

SPM:  Sum  of  Pair-Maxima  NAP:  Non-Bipartite  Accumulated  Pairs 

ESPM:  Ensemble  Sum  of  Pair-Maxima  JJS:  James,  James,  and  Siegmund  test 

Table  12.  Test  power  to  detect  a  scale  change  of  magnitude  A  for  MVN  mixture 
case  with  dimension  d  =  5  and  change  point  r  =  101  based  on  N  =  200  , 
a  =  0.05 ,  and  1000  simulations.  Jump  case  is  to  the  left;  drift  case  is  to  the 
right.  Shading  indicates  excessive  false  alarm  rate. 


3.  Multivariate  Weibull  Case 

Table  13  presents  power  results  associated  with  the  multivariate  Weibull 
distribution  as  an  example  of  a  skewed  case.  For  these  simulations  shape  parameter 
rj  =  1.5  remains  fixed  while  scale  parameter  P  varies  from  1  to  2  in  the  first  dimension 
only;  this  corresponds  to  coincident  changes  in  both  location  and  scale.  While  false 
alarm  rates  for  the  JJS  test  are  not  as  excessive  for  this  case  as  for  the  multivariate 
mixture  case,  they  still  clearly  violate  the  specified  0.05  test  level.  As  before,  the 
matching-based  tests  appear  to  respect  test  level. 
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F  •  d  =  5- 
/?  change 

A 

0.00 

0.25 

0.50 

0.75 

1.00 


Jump;  r  =  101 

SPM 

NAP 

ESPM 

JJS 

0.05 

0.05 

0.06 

0.09 

0.08 

0.06 

0.25 

0.22 

0.13 

0.10 

0.70 

0.70 

0.25 

0.20 

0.95 

0.96 

0.34 

0.27 

0.99 

1.00 

Drift;  r  =  101 

SPM 

NAP 

ESPM 

JJS 

0.06 

0.05 

0.05 

0.09 

0.06 

0.07 

0.12 

0.17 

0.08 

0.07 

0.35 

0.46 

0.15 

0.12 

0.70 

0.81 

0.23 

0.20 

0.86 

0.94 

SPM:  Sum  of  Pair-Maxima  NAP:  Non-Bipartite  Accumulated  Pairs 

ESPM:  Ensemble  Sum  of  Pair-Maxima  JJS:  James,  James,  and  Siegmund  test 

Table  13.  Test  power  to  detect  a  change  in  the  scale  parameter  f3  of  magnitude 
A  for  MV  Weibull  case  with  dimension  d  =  5  and  change  point  r  =  101 
based  on  N  =  200  ,  a  =  0.05  ,  and  1000  simulations.  Jump  case  is  to  the  left; 
drift  case  is  to  the  right.  Shading  indicates  excessive  false  alarm  rate. 


In  summary,  the  Sum  of  Pair-Maxima  (SPM),  Non-Bipartite  Accumulated  Pairs 
(NAP),  and  Ensemble  Sum  of  Pair-Maxima  (ESPM)  tests  all  demonstrate  power  to  detect 
a  change  point  in  every  examined  case  for  different  underlying  different  distributions, 
dimensionality,  change-point  location,  change  parameter,  and  type  of  change  while 
achieving  a  significance  level  consistent  with  nominal  test  level.  The  power  of  each  test 
is  reduced  as  dimension  increases  or  as  change-point  location  moves  away  from  the 
middle  of  the  observation  sequence. 

The  ESPM  test  outperforms  the  SPM  and  NAP  tests  in  every  case  and  is 
preferable  among  the  three  tests  for  use.  The  ESPM  test  has  power  comparable  to  the 
parametric  JJS  test  in  the  case  of  a  mean  change  when  the  underlying  distribution  is 
multivariate  normal,  except  perhaps  when  the  change-point  is  far  away  from  the  center  of 
the  sequence.  The  ESPM  test  is  preferable  to  the  JJS  test  in  non-normal  cases  due  to 
excessive  JJS  false  alarm  rates,  and  is  superior  in  detecting  scale  changes  when  the 
underlying  distribution  is  normal.  The  NAP  test  should  be  considered  for  use  if  one 
desires  information  about  the  location  of  a  change  point  in  addition  to  detecting  whether 
or  not  a  change-point  exists. 
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c. 


DIFFERENT  COST  FUNCTIONS 


To  gain  insight  regarding  the  impact  of  cost  function  selection  we  compare  the 
performance  of  the  ESPM  test  using  Mahalanobis  distance  (MD),  Euclidean  distance 
(ED),  Mahalanobis  distance,  robust  (MD-R),  and  multivariate  rank  distance  (RD)  as 
defined  in  (2.15)-(2.23)  for  a  few  representative  cases.  We  list  MD  in  the  first  column  as 
the  reference  cost  measure  which  was  used  for  all  previous  cases  in  the  simulation  study. 
For  MD-R  we  set  the  nearest-neighbor  parameter  k  (as  identified  in  the  discussion 
preceding  equation  (2.17)  )  equal  to  8,  which  is  in  the  range  of  recommended  values  for 
that  parameter  given  by  Wang  and  Raferty  (2002). 

In  every  case,  MD  and  MD-R  performance  are  nearly  identical,  and  ED  and  RD 
performance  are  nearly  identical.  ED  and  RD  perfonn  as  well  or  better  than  MD  and 
MD-R;  this  performance  difference  is  more  evident  in  higher  dimension  and  is  most 
evident  in  for  the  multivariate  Weibull  case.  These  differences  seem  attributable  to  the 
fact  that  MD  and  MD-R  must  estimate  the  covariance  of  the  underlying  distribution, 
while  for  the  cases  we  examine  the  underlying  covariance  is  very  close  to  the  identity 
covariance  assumed  by  ED  and  RD. 


FMVn;ESPM; 
mean  change 

d  = 

5 ;  jump;  r  = 

101 

A 

MD 

ED 

MD-R 

RD 

0.00 

0.06 

0.06 

0.05 

0.06 

0.25 

0.16 

0.16 

0.16 

0.16 

0.50 

0.56 

0.58 

0.56 

0.57 

0.75 

0.93 

0.94 

0.93 

0.95 

1.00 

1.00 

1.00 

1.00 

1.00 

MD:  Mahalanobis  distance 
MD-R:  Mahalanobis  distance,  robust 


d  =  20 ;  jump;  r  =  101 

MD 

ED 

MD-R 

RD 

0.05 

0.05 

0.05 

0.05 

0.11 

0.12 

0.11 

0.12 

0.31 

0.36 

0.30 

0.36 

0.72 

0.80 

0.71 

0.79 

0.96 

0.99 

0.96 

0.99 

ED:  Euclidean  distance 
RD:  Multivariate  rank  distance 


Table  14.  Ensemble  Sum  of  Pair-Maxima  (ESPM)  test  power  to  detect  a  mean 
change  of  magnitude  A  for  MVN  case  with  dimension  d  =  5  and  change 
point  r  =  101  under  different  cost  functions,  based  on  N  =  200  ,  a  =  0.05  , 
and  1000  simulations.  Jump  case  is  to  the  left;  drift  case  is  to  the  right. 
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/A,;  ESPM; 
mean  change 

d  = 

5 ;  jump;  r  = 

101 

A 

MD 

ED 

MD-R 

RD 

0.00 

0.05 

0.05 

0.05 

0.05 

0.25 

0.16 

0.15 

0.15 

0.16 

0.50 

0.46 

0.50 

0.49 

0.52 

0.75 

0.89 

0.91 

0.90 

0.90 

1.00 

0.99 

0.99 

0.99 

0.99 

MD:  Mahalanobis  distance 
MD-R:  Mahalanobis  distance,  robust 


d  =  20  ;  jump;  r  =  101 

MD 

ED 

MD-R 

RD 

0.06 

0.05 

0.06 

0.06 

0.10 

0.09 

0.10 

0.09 

0.26 

0.34 

0.28 

0.33 

0.57 

0.69 

0.62 

0.69 

0.86 

0.95 

0.91 

0.95 

ED:  Euclidean  distance 
RD:  Multivariate  rank  distance 


Table  15.  Ensemble  Sum  of  Pair-Maxima  (ESPM)  test  power  to  detect  a  mean 
change  of  magnitude  A  for  MVN  mixture  case  with  dimension  d  =  5  and 
change  point  r  =  101  under  different  cost  functions,  based  on  V  =  200 , 
a  —  0.05 ,  and  1000  simulations.  Jump  case  is  to  the  left;  drift  case  is  to  the 

right. 


FWeib;ESPM; 

(3  change 

d  = 

5 ;  jump;  r  = 

101 

A 

MD 

ED 

MD-R 

RD 

0.00 

0.05 

0.06 

0.05 

0.05 

0.25 

0.24 

0.28 

0.25 

0.28 

0.50 

0.70 

0.77 

0.71 

0.76 

0.75 

0.93 

0.97 

0.95 

0.97 

1.00 

0.99 

1.00 

1.00 

1.00 

MD:  Mahalanobis  distance 
MD-R:  Mahalanobis  distance,  robust 


d  =  20 ;  jump;  r  =  101 

MD 

ED 

MD-R 

RD 

0.05 

0.04 

0.05 

0.04 

0.14 

0.19 

0.15 

0.18 

0.46 

0.66 

0.47 

0.67 

0.73 

0.95 

0.77 

0.95 

0.93 

1.00 

0.95 

1.00 

ED:  Euclidean  distance 
RD:  Multivariate  rank  distance 


Table  16.  Ensemble  Sum  of  Pair-Maxima  (ESPM)  test  power  to  detect  a  change 
in  the  scale  parameter  f3  of  magnitude  A  for  MV  Weibull  case  with 
dimension  d  —  5  and  change  point  r  =  101  under  different  cost  functions, 
based  on  N  —  200  ,  a  —  0.05  ,  and  1000  simulations.  Jump  case  is  to  the  left; 

drift  case  is  to  the  right. 
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D.  COMPUTATIONAL  DETAILS 


Simulations  in  Section  B  of  this  chapter  were  performed  using  R  version  2.9.0  on 
the  Hamming  cluster  of  the  Naval  Postgraduate  School’s  High  Performance  Computing 
Center  (HPC),  which  is  a  Sun  Microsystems  6048  Blade  modular  system  with  1152 
processing  cores.  We  computed  non-bipartite  weighted  matchings  using  Kolmogorov’s 
(2009)  Blossom  V  algorithm.  In  its  published  form,  Kolmogorov’s  algorithm  computes  a 
single  optimal  non-bipartite  matching  on  a  set  of  N  observations.  We  modified  this 
routine  slightly  in  the  source  language  so  that  it  computes  a  full  sequence  of  orthogonal 
successive  optimal  matchings  that  rather  than  just  a  single  matching.  Table  17  shows 
typical  realized  runtimes  to  compute  N  /  2  orthogonal  successive  optimal  matchings 
calling  Kolmogorov’s  routine  in  R  for  various  sample  sizes. 


N 

Run  time  (sec) 

20 

<0.01 

50 

0.01 

100 

0.05 

200 

0.60 

300 

2.9 

400 

11 

500 

48 

Table  17.  Typical  time  to  compute  N  /  2  orthogonal  successive  optimal 
matchings  calling  Kolmogorov’s  algorithm  (modified  to  compute  orthogonal 
successive  optimal  matchings)  in  R. 


We  realized  significant  reductions  (at  least  two  orders  of  magnitude)  in  total  simulation 
time  by  taking  advantage  of  the  HPC’s  batch  job  scheduling  capability. 

Simulations  in  Section  C  were  performed  using  R  version  2.9.0  on  a  Windows  XP 
machine  with  an  Intel  ®  Pentium  ®  4  3.4  GHz  processor.  We  computed  non-bipartite 
weighted  matchings  using  Derigs’s  (1988)  algorithm.  The  algorithm  itself  is  in  the 
FORTRAN  programming  language;  compiled  code  is  embedded  in  a  dynamic  link 
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library  file  that  can  be  called  directly  in  R,  provided  by  Professor  Bo  Lu  at  the  Ohio  State 
University.  In  its  current  form  this  algorithm  requires  that  edge  weights  be  integer¬ 
valued.  For  our  purposes,  we  accommodated  this  requirement  by  rounding  non-integer 
costs  to  four  decimal  places  and  then  scaling  each  cost  by  104 .  Additionally,  this  routine 
requires  the  assignment  of  a  prohibitive  cost  to  the  diagonal  of  any  cost  matrix  to  block 
the  pairing  of  an  observation  with  itself.  For  this  study  we  set  this  prohibitive  cost  to  be 
N  /  2  times  the  maximum  of  all  interpoint  costs.  We  used  this  same  prohibitive  cost  for 
blocking  to  compute  orthogonal  successive  optimal  matchings.  Table  18  shows  typical 
realized  runtimes  to  compute  N 12  orthogonal  successive  optimal  matchings  using 
Derigs’s  algorithm  in  R  for  various  sample  sizes. 


N 

Run  time  (sec) 

20 

<0.01 

50 

0.02 

100 

0.2 

200 

2.9 

300 

16 

400 

56 

500 

143 

Table  18.  Typical  time  to  compute  N  /  2  orthogonal  successive  optimal 
matchings  using  Derigs’s  algorithm  in  R. 


As  mentioned  previously,  the  theoretical  runtime  for  existing  algorithms  to  find  a 
single  optimal  non-bipartite  matching  on  a  complete  graph  is  A3 j .  Our  ESPM 

statistic  involves  computing  N  /  2  successive  optimal  matchings  on  a  graph,  which  can 
lead  to  lengthy  run  times  for  large  sample  sizes.  Long  runtimes  for  cases  of  very  large 
sample  size  pose  a  practical  limitation  to  the  matching  methods  we  propose.  We  discuss 
related  research  opportunities  in  the  next  chapter. 
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V.  CONCLUSIONS  AND  OPPORTUNITIES  FOR  FURTHER 

RESEARCH 


In  this  dissertation,  we  introduce  new  nonparametric  matching-based  approaches 
to  the  multidimensional  change-point  problem.  These  approaches  lead  to  effective 
change-point  detection  procedures  and  highlight  the  potential  value  of  matching 
techniques  to  more  general  statistical  applications.  Our  review  of  the  broad  field  of 
change-point  detection  reveals  that  this  continues  to  be  an  area  of  active  research  and  that 
robust  multivariate  approaches  to  this  problem  remain  few.  Most  existing  approaches 
make  restrictive  distributional  assumptions  (such  as  multivariate  normality)  or  are  limited 
to  the  single-test  case  where  the  potential  change  point  is  pre-detennined  and  the  problem 
is  the  classical  one  of  testing  whether  two  samples  of  observations  are  from  the  same 
distribution. 

We  propose  four  new  change-point  tests:  the  Sum  of  Pair-Maxima  (SPM)  test,  the 
Non-Bipartite  Accumulated  Pairs  (NAP)  test,  the  Ensemble  Sum  of  Pair-Maxima 
(ESPM)  test,  and  the  Bipartite  Accumulated  Pairs  (BAP)  test.  The  first  three  tests, 
designed  to  test  for  homogeneity  among  multivariate  data  when  no  observation  history  is 
available,  all  demonstrate  power  to  detect  a  change  point  under  a  variety  of  alternative 
hypotheses  at  fixed  false  alann  rates.  The  ESPM  test  utilizes  additional  change-point 
information  available  from  many  good  (that  is,  low-weight)  orthogonal  matchings,  and  is 
superior  among  these  nonparametric  tests  to  detect  a  change  point.  Additionally,  the 
ESPM  test  has  power  comparable  to  a  parametric  competitor,  the  JJS  test,  even  when  its 
parametric  assumptions  are  met.  The  power  of  the  ESPM  test  not  only  establishes  itself 
as  an  effective  change-point  test,  but  also  validates  matching  as  a  useful  approach  to  the 
change-point  problem. 

This  research  invites  several  possibilities  for  extension.  One  obvious  question  is 
whether  or  not  any  of  these  tests  might  be  reasonably  extended  as  sequential  change- 
point  tests.  While  it  is  difficult  in  general  to  sequentialize  hypothesis  tests,  sequential 
change-point  detection  techniques  would  have  valuable  application.  One  requirement  for 

such  an  extension  would  be  to  extend  the  theory  presented  here  as  necessary  for 
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sequentialization.  Another  practical  problem  associated  with  such  an  extension  is  the 
question  of  how  to  efficiently  update  an  existing  optimal  matching  on  N  observations 
with  the  addition  of  one  or  more  data  points  to  the  observation  set. 

Other  opportunities  include  finding  ensemble  extensions  for  the  NAP  and  BAP 
tests.  The  fact  that  the  exact  distributions  of  these  individual  tests  are  known  might  make 
the  task  of  finding  exact  associated  ensemble  distributions  (or  good  approximations) 
more  tractable.  Additionally,  the  performance  of  the  BAP  test  remains  to  be  evaluated. 

One  challenge  to  research  in  this  area  is  the  scarcity  of  non-bipartite  weighted 
matching  software  modules  for  typical  statistical  software  applications.  The  simulation 
study  for  this  research  relies  heavily  on  interfaces  between  C,  C++,  or  FORTRAN;  and 
S-PLUS®,  R,  or  MATLAB®  that  we  or  others  have  built  manually.  The  mainstreaming 
of  any  such  interface  would  greatly  enable  broader  related  research.  Research 
opportunities  exist  to  improve  the  efficiency  of  non-bipartite  matching  algorithms.  Even 
using  existing  algorithms,  time  improvements  would  be  gained  by  reducing  the  number 
of  orthogonal  successive  optimal  matchings  computed  for  the  ESPM  test.  As  presented 
here,  the  ESPM  statistic  is  fonned  by  summing  over  N 12  orthogonal  successive  optimal 
matchings  where  N  is  the  sample  size.  Additional  research  is  necessary  to  detennine 
whether  fewer  (perhaps  far  fewer)  orthogonal  successive  optimal  matchings  are  adequate 
to  achieve  good  detection  power  against  alternate  hypotheses.  Also,  it  would  be 
worthwhile  to  investigate  the  usefulness  of  “greedy”  algorithms  in  this  context.  A  greedy 
matching  algorithm  finds  a  good  matching  on  N  observations  by  pairing  the  two  closest 
points,  then  the  next  two  closest,  and  so  on  until  a  maximal  matching  has  been 

constructed.  This  faster  algorithm  (C^A2) )  does  not  in  general  provide  an  optimal  non- 

bipartite  matching.  However,  a  greedy  matching  may  still  be  good  enough  to  provide 
valuable  change-point  information;  we  believe  this  would  be  a  worthwhile  area  for  study. 

Rosenbaum’s  (2005)  case  for  the  consistency  of  the  cross-match  statistic  seems 
quite  reasonable,  but  as  he  states  it  is  “admittedly  informal.”  His  argument  is  also 
constrained  to  less  general  alternative  hypotheses  than  we  have  considered  in  our  work. 
Because  our  argument  for  the  consistency  of  the  SPM  test  (and  therefore  for  the  ESPM 
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test  by  direct  extension)  requires  the  consistency  of  the  cross-match  statistic,  work  needs 
to  be  done  to  establish  the  consistency  of  the  cross-match  statistic  more  formally  and 
against  alternative  hypotheses  of  a  more  general  nature. 

We  alluded  previously  to  the  fact  that  machine  health  diagnosis  and  prognosis 
problems  were  an  initial  motivation  for  this  research,  and  we  are  interested  in  ways  to 
apply  this  work  to  that  area.  Such  problems  are  often  characterized  by  high 
dimensionality  and  serial  correlation.  In  addition  to  detecting  the  presence  of  a  change 
point  in  a  sequence  of  observations,  it  would  be  useful  also  to  estimate  where  in  the 
sequence  the  change  point  occurred.  Furthermore,  it  would  be  helpful  in  the  event  that  a 
change  point  is  detected  to  characterize  the  nature  of  the  change  (for  example,  abrupt  or 
gradual)  and  the  severity  of  the  change  for  prognostic  purposes  such  as  estimating 
remaining  useful  life. 

An  idea  that  seems  worthy  of  consideration  is  a  generalization  of  our  matching 
approach  to  vertex  groupings  of  cardinality  greater  than  two.  The  tests  we  propose  here 
are  all  matching-based,  where  we  mean  matching  in  the  strict  graph-theoretical  sense  as 
defined  in  Chapter  II.  Each  matching  is  a  collection  of  single  edges,  and  each  edge  is  in 
turn  is  a  two-element  subset  of  vertices.  Algorithms  to  find  optimal  non-bipartite 
weighted  matchings  already  exist,  and  we  have  demonstrated  that  matchings  can  be  used 
for  effective  statistical  inference.  However,  it  might  be  worth  examining  whether 
collections  of  more  than  one  edge  (that  is,  collections  of  more  that  two  vertices)  might 
provide  useful  (or  even  better)  information  to  the  change-point  problem.  For  example, 
instead  of  computing  an  optimal  non-bipartite  matching  on  a  set  of  observations  one 
might  compute  an  optimal  “three-grouping,”  where  the  objective  function  for  optimality 
might  be  to  minimize  the  collective  cycle  cost  or  collective  minimum  spanning  tree  cost 
across  subgroups  of  size  three.  Similar  to  the  SPM  test,  one  might  consider  the  sum  of 
group-maxima  (or  -minima,  or  -median,  or  some  other  unary  set  operator).  Even  more 
general  “k-groupings”  might  be  considered.  Unlike  the  well-known  method  of  k-means 
clustering,  which  partitions  N  observations  into  k  groups  (perhaps  of  different  sizes) 
based  on  an  objective  criterion  such  as  minimizing  the  sum  if  within-cluster  differences,  a 
“k-groupings”  approach  would  specify  group  size  k  first  and  then  collect  vertices  into 
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groups  so  as  to  minimize  some  particular  criterion.  We  are  not  aware  of  any  specialized 
algorithms  to  find  such  groupings  and  the  associated  statistical  properties  of  such 
groupings  are  likely  quite  complicated,  but  these  ideas  might  constitute  fertile  ground  for 
research. 

Another  interesting  idea  involves  retaining  some  of  the  original  observation 
information  for  the  computation  of  a  test  statistic.  The  methods  we  propose  use  the 
observed  data  in  two  distinct  steps:  First  we  compute  an  optimal  non-bipartite  matching 
based  on  observation  content  excluding  data  sequence  labels,  then  we  compute  a 
nonparametric  test  statistic  based  only  on  sequence  labels  with  respect  to  that  matching. 
However,  it  might  be  useful  to  carry  over  additional  information  from  the  data  into  the 
computation  of  a  test  statistic.  For  example,  one  might  associate  with  each  pair  in  the 
matching  some  measure  of  pair  “quality”  based  on  the  cost  of  the  pair.  These  quality 
values  might  then  be  used  as  weightings  in  the  computation  of  a  sum  of  pair-maxima- 
type  statistic,  and  perhaps  improve  the  detection  power  of  such  a  test. 

Finally,  an  area  upon  which  we  have  only  touched  briefly  involves  the  choice  of 
cost  function.  In  research  such  as  ours  the  existence  of  some  appropriate  dissimilarity 
measure  associated  with  the  sample  space  of  interest  is  often  assumed  and  from  there  the 
desired  analysis  proceeds.  While  our  theoretical  results  regarding  non-ensemble  null 
distributions  depend  only  on  the  exchangeability  of  sequence  labels  and  not  on  choice  of 
cost  function,  we  expect  detection  power  against  alternative  hypotheses  to  depend  on  that 
choice.  While  Mahalanobis  distance  (or  some  robust  modification  of  Mahalanobis 
distance)  is  a  natural  choice  of  cost  function  for  continuous  random  variables,  cases  of 
interest  may  include  a  mixture  of  categorical,  ordinal,  and  continuous  random  variables. 
Even  for  continuous  cases,  the  ability  to  detect  change  points  with  respect  to  a 
Mahalanobis  distance  function  might  be  improved.  For  example,  shifting  observations 
by  a  component-wise  smoothed  mean  can  lead  to  better  covariance  matrix  estimation  in 
cases  where  a  change  point  exists.  In  any  case,  further  study  of  the  effects  of  cost 
function  choice  on  the  power  of  tests  presented  here  would  be  of  useful,  especially  for 
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cases  that  include  categorical  or  ordinal  dimensions.  In  particular,  for  real-world 
application  of  these  methods  it  would  be  worth  investigating  which  cost  functions  lead  to 
the  most  attractive  power  characteristics  for  the  specific  case  at  hand. 
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APPENDIX  A:  HIGHER  MOMENT  DERIVATIONS 


In  this  appendix,  we  derive  various  moments  associated  with  the  Sum  of  Pair- 
Maxima  (SPM)  statistic. 

A.  MEAN  AND  VARIANCE  OF  Yl 


For  a  non-bipartite  match  on  N  =  2 n  observations,  each  of  the  (2 n) !  possible 
assignments  of  ranks  is  equally  likely  under  the  null  hypothesis,  and  the  random  variables 
F|,...,7n  are  exchangeable.  Therefore,  Yx  takes  on  the  value  t  when  observation  t  is 

paired  with  some  observation  5  of  lesser  sequence  label  and  (s,t)  is  indexed  as  the  first 


among  the  n  pairs.  But 
(E.l) 


P[t  is  paired  with  5} 


1 

2n  —  1 


and 

(E.2) 


P  j(s,t)  is  indexed  first  among  matches]  t  is  paired  with  ,s'| 


? 

n 


so 


(E.3) 


s=l 


P is  indexed  first  among  matches  ]/  is  paired  with . 
xP  [t  is  paired  with. 


t- 1 


'-1  1  1 

Vi.— -  : 

~ln  (2«-l)  n(2«-l) 


=  ('-!) 


2  n 

V  2  J 


,  t  =  2,...,2n 


The  desired  expressions  follow  directly: 


*M=E< 


(f-1)  _2(2n  +  l) 


=7  n 


(E.4) 


(2/i-l)  3 

(f  — l)  _  (2«  +  l)  (3/i  +  l) 
n[2n  —  l)  3 

Var[Y,]=E\Y?]-E\Y$  =  (E±fEPi 


£[F]  =  E'! 


t=  2 
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B.  COVARIANCE  OF  Yx  AND  Y2 


First  we  find  P(Y2  =t  \  YX  =  s)  by  observing  that  for  the  case  t  <s  , 


t- 1 


f(y2 =<ii; =*)= 


Z(f"2) 


U= 1 


2/i-2 

\  2  J 


(2 nY  O—oV1'- 


EM) 


V  ^  y  «=<+i 


2n-2 

V  2  J 


r  2  n' 

\2  j 


(E.5) 


h-i) 


2  n 

v  2  y 


(t-i)(5-3) 


(s-l)(n-l)(2n-3) 


,  f  =  2,...,s-l 


and  for  the  case  t  >  s  . 


^(n=<|i;=j)=- 


Z('-3) 


' 2n-2 7 

-1 

r2n^ 

v  2  y 

v  2  J 

(E.6) 


h-1) 

t- 3 


f2n^ 
V  2  7 


t  =  5  + 1, . . . ,  2n . 


(« -l)(2n -3)  ’ 

Now  condition  on  Yx  to  compute  E[Y2  Yx]  : 

*Kn; =»]=£',  V't?  3)+Z >j  ,'J23  3' 

,=2  (j-1)(/i-1)(2/i-3)  ,=s+i  (n-l)(2«-3) 


(E.7) 


5 (5 -2) (5 -3)  f  4h(2h  +  1)(h-2)  s(s  +  l)(s-4) 


3  ( //  —  1)  ( 2/7  —  3) 


4h(2/7  +  1)(«-2)-2s(s-5) 
3(«-l)(2«-3) 


so 


(E.8) 


£[Fi^]  =  £[i;£[r!|F1]] 

4n (2n  + 1) (n  -  2)  -  2s  (5  -  5)  (5  - 1) 


In 

=  5*  3  (/7  —  l)  (2/7  —  3) 

8(2n  +  l)(5«  +  2) 

45  ' 


((2/7-1) 
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Therefore, 


Cov[^,t2]  =  f[t2i;]-f[t2]f[^] 

8(2n  +  l)(5n  +  2) 

2  (2/7  +  1)  j2 

45 

3 

^  / 

4(2/i +  1) 

45 

C.  THIRD  MOMENTS  OF  Y1 ,  Y2 ,  AND  73 


As  in  the  first  and  second  moment  calculations,  compute  E  if  directly: 


(E.  10) 


fy  Ml  (2«  +  l)(24ir  + 15/1  +  1) 
1  *  t=2  n{ln  —  \)  15 


Now  compute  E by  conditioning  on  Yx  using  (E.7): 

e[yX]  =  e[y^e[y2\y,]] 


2  n 


(Ell) 


=2> 

s= 2 


An (2n  + 1)  (n  -  2)  -  2s  (s  -  5)  (s  - 1) 


3(n-l)(2n-3) 
4(2n  +  l)(l5n2  +10n  +  l) 


\(2n  -1) 


45 

In  a  similar  fashion,  we  apply  a  series  of  conditioning  arguments  to  compute 
E\Yx  Y2  K ] .  First,  let  Z(1),Z(2),  and  Z(3)  take  on  the  values  of  YX,Y2,  and  K  such  that 

Z(|j  <  Z(2)  <  Z(3)  (these  Z;ij  are  unrelated  to  the  Z(  of  the  BAP  test).  Then 


e[yj2y,]  =  e 


7  7  7 

W  (2)  (3). 


=  E 


7  F 

z(3  r 


Z  F 

z(2r 


z  z 

Z(2)’Z(3) 


J(3) 


A  direct 


combinatorial  argument  gives 


2  2 

r2n-2^ 

^2/7  -4^ 

v2> 

v  2  , 

v  2  , 

2  <  <  t2<t3<  2 n, 
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so 


2n-2 


P[Z(2)  ~t2’^(3)  ~  X  ^(^(1)  _^’^(2)  “  ?2’Z(3)  _^) 

<i=2 

(E.  13)  _.3fe-l)(f2-2)(f2-3)(f3-5) 


f  2«^ 


f  2«  -  2V  2n-4 


,  ,  4  <  t2  <  t3  <  2n, 


v  ^  yv  ^  yv  ^  y 


and 


2n-l 


<2=4 

(E.14)  _3(f3  -l)(f3 -2)(f3  -3)(f3 -4)fc -5) 

f  2n\f  2n  —  2\f  2n-4'\ 


6  <  t,  <  2  n. 


v^-yv  ^  yv  ^  y 


Therefore, 


(E.15) 


-M^(i)  _h 


Z(2)  -  t2,Z (3j  - 


^{^(1)  h’^(2)  ^2  ’^(3) 

P{Z(2)  =  ^2’^(3)  =  ^3) 
_  2(^-1) 


(^2  l)  (^2  2) 


2<tl<t2<t3<  2 n. 


and 


(E-16) 


^*1  ^(2)  —  ^2 


■^(3)  _  ^3  I  _ 


P{^(2)  ^2  5^(3)  ^3) 


P(Z<3)  '’) 

4(<;-l)(<3-2)(t,-3) 

(<3-l)((J-2)(<J-3)(<J-4) 

Now  compute  conditional  expected  values 


,  4  <  t2  <  t3  <  2n. 


J(i) 


Z(2)  -  t2,Z^ 3)  - 13 


<2-1 


t,= 2 


Z( 2)  -  ^2  5^(3)  -  *3 


(E.17) 


_  (h  l) 

_^(t2-l)(t2-2) 


2t, 
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and 


(E.18) 


Finally, 


(E.19) 


7  F 
Z(2)C 

_Z(1) 

1 - 1 

m 

N 

(N 

N 

Z(3)  -  ?3 

=  E 

2  Z2 

/Z(2) 

3 

Z(3)  -  h 

^2 1\  4(ti -!)(/, -2)((, -3) 

h  3  (f,-l)(fa -2)(f,-3)(f, -4) 

^(5(3-1) 

45 


E[YJ2Y,]  =  E 


7  F 

z(3  r 


7  F 

z(2  r 


"(i) 


7  7 

Z(2)’Z(3) 


"(3) 


=  E 


gwKr1) 

45 


. f  ^  (5V-1)  3fc  -l)(f3  -2)(f3  -3)fo  -4)(f3  -5) 

z"'  45  f  2n\(  2n  —  2\(2n  -4^ 

4 

2  2 


t-i=6 


16(2n  +  l)(70n2+49«  +  6) 


945 
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APPENDIX  B:  QUANTILE  TABLES 


A.  APPROXIMATE  CRITICAL  VALUES  LOR  TN 


a 

N  =  6 

8 

10 

12 

14 

16 

18 

20 

22 

0.001 

- 

- 

- 

43 

58 

76 

97 

120 

145 

0.005 

- 

- 

31 

44 

60 

79 

99 

123 

149 

0.01 

- 

- 

31 

45 

61 

80 

101 

125 

151 

0.025 

- 

21 

32 

46 

63 

81 

103 

127 

154 

0.05 

- 

21 

33 

47 

64 

83 

105 

129 

156 

0.1 

13 

22 

34 

48 

65 

85 

107 

132 

159 

a 

24 

26 

28 

30 

32 

34 

36 

38 

40 

0.001 

173 

204 

237 

272 

310 

351 

394 

440 

489 

0.005 

178 

209 

242 

279 

317 

359 

403 

449 

498 

0.01 

180 

211 

245 

282 

321 

363 

407 

454 

503 

0.025 

183 

215 

249 

286 

326 

368 

413 

460 

510 

0.05 

186 

218 

253 

290 

330 

373 

418 

466 

516 

0.1 

189 

222 

257 

295 

335 

378 

424 

472 

523 

a 

60 

80 

100 

120 

140 

160 

180 

200 

220 

0.001 

1113 

1995 

3137 

4537 

6199 

8121 

10304 

12749 

15455 

0.005 

1131 

2023 

3175 

4588 

6262 

8199 

10397 

12857 

15581 

0.01 

1140 

2036 

3194 

4612 

6293 

8236 

10442 

12910 

15641 

0.025 

1152 

2056 

3221 

4648 

6338 

8292 

10508 

12987 

15731 

0.05 

1163 

2073 

3244 

4679 

6377 

8339 

10564 

13054 

15807 

0.1 

1176 

2092 

3272 

4715 

6422 

8394 

10630 

13130 

15896 

a 

240 

260 

280 

300 

320 

340 

360 

380 

400 

0.001 

18424 

21655 

25148 

28903 

32922 

37203 

41747 

46554 

51624 

0.005 

18567 

21816 

25328 

29103 

33142 

37444 

42009 

46838 

51931 

0.01 

18636 

21894 

25415 

29200 

33248 

37560 

42136 

46976 

52080 

0.025 

18737 

22008 

25543 

29342 

33404 

37732 

42323 

47179 

52299 

0.05 

18825 

22107 

25653 

29464 

33539 

37879 

42483 

47353 

52487 

0.1 

18925 

22220 

25780 

29604 

33694 

38049 

42668 

47553 

52703 

Table  19.  Estimated  critical  values  for  TN . 


Critical  regions  correspond  to  values  of  ±N  strictly  less  than  the  appropriate 
quantile;  N  sample  size  and  a  is  significance  level.  Values  for  N  =  6,  8,  and  10  are 
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exact;  values  for  ^V>10  are  approximations  by  Edgeworth  expansion  using  (3.31).  A 
dash  entry  means  that  the  significance  level  of  interest  cannot  be  attained  for  that 
sample  size. 


B.  APPROXIMATE  CRITICAL  VALUES  LOR  KN 


N  =  20 
a 

0.001 

0.005 

0.01 

0.025 

0.05 

0.1 


N  =  40 
a 

0.001 

0.005 

0.01 

0.025 

0.05 

0.1 


N  =  60 
a 

0.001 

0.005 

0.01 

0.025 


d  =  1 

2 

3 

4 

5 

10 

20 

50 

2.57 

2.40 

2.27 

2.17 

2.10 

1.93 

1.88 

1.78 

[.04] 

[.03] 

[.04] 

[.02] 

[.02] 

[.02] 

[.02] 

[.02] 

1.96 

1.88 

1.81 

1.76 

1.72 

1.60 

1.56 

1.52 

[.02] 

[.02] 

[.02] 

[.01] 

[.02] 

[.02] 

[.01] 

KOI] 

1.72 

1.66 

1.60 

1.56 

1.53 

1.46 

1.43 

1.38 

[.02] 

[.01] 

[.01] 

[.01] 

[.02] 

[.01] 

KOI] 

[.01] 

1.38 

1.35 

1.33 

1.31 

1.29 

1.24 

1.22 

1.21 

KOI] 

[.01] 

[.02] 

[.01] 

[.02] 

KOI] 

[.02] 

KOI] 

1.13 

1.12 

1.10 

1.10 

1.09 

1.07 

1.07 

1.03 

KOI] 

[.02] 

[.01] 

KOI] 

[.01] 

KOI] 

[.01] 

[.02] 

0.90 

0.90 

0.89 

0.89 

0.89 

0.88 

0.88 

0.86 

KOI] 

KOI] 

KOI] 

[.01] 

[.01] 

[.01] 

[.02] 

[.01] 

d  =  1 

2 

3 

4 

5 

10 

20 

50 

2.77 

2.57 

2.42 

2.31 

2.22 

2.02 

1.95 

1.84 

[.04] 

[.04] 

[.03] 

[.04] 

[.03] 

[.02] 

[.02] 

[.02] 

2.11 

1.99 

1.90 

1.83 

1.78 

1.66 

1.62 

1.56 

[.02] 

[.02] 

[.01] 

[.02] 

[.01] 

[.01] 

[.01] 

[.01] 

1.83 

1.74 

1.68 

1.63 

1.59 

1.50 

1.47 

1.43 

[.01] 

[.01] 

[.01] 

[.01] 

[.01] 

[.01] 

[.01] 

[.01] 

1.47 

1.42 

1.38 

1.35 

1.33 

1.28 

1.26 

1.24 

[.01] 

[.01] 

[.01] 

[.01] 

[.01] 

[.01] 

[.01] 

KOI] 

1.20 

1.17 

1.15 

1.14 

1.13 

1.10 

1.09 

1.08 

[.01] 

[.01] 

KOI] 

KOI] 

KOI] 

KOI] 

KOI] 

KOI] 

0.93 

0.92 

0.92 

0.91 

0.91 

0.91 

0.91 

0.90 

KOI] 

KOI] 

KOI] 

KOI] 

KOI] 

KOI] 

KOI] 

KOI] 

d  =  1 

2 

3 

4 

5 

10 

20 

50 

2.80 

2.61 

2.46 

2.35 

2.26 

2.05 

1.97 

1.86 

[.04] 

[.05] 

[.04] 

[.03] 

[.03] 

[.02] 

[.02] 

[.02] 

2.15 

2.02 

1.93 

1.86 

1.80 

1.68 

1.64 

1.57 

[.02] 

[.02] 

[.02] 

[.01] 

[.01] 

[.01] 

[.01] 

[.01] 

1.85 

1.76 

1.70 

1.65 

1.62 

1.53 

1.50 

1.44 

[.01] 

[.01] 

[.01] 

[.01] 

[.01] 

[.01] 

[.01] 

[.01] 

1.47 

1.43 

1.39 

1.37 

1.35 

1.30 

1.28 

1.25 

[.01] 

[.01] 

[.01] 

[.01] 

[.01] 

[.01] 

KOI] 

KOI] 
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0.05 


1.20 

1.18 

1.16 

1.15 

1.14 

1.11 

1.10 

1.09 

[.01] 

[.01] 

[<.01] 

[<.01] 

[<.01] 

[<.01] 

[<.01] 

[<.01] 

0.94 

0.93 

0.93 

0.93 

0.92 

0.92 

0.92 

0.92 

[<•01] 

[<.01] 

[<.01] 

[<.01] 

[<.01] 

[<.01] 

[<.01] 

[<.01] 

jV>80 

a 

0.001 

0.005 

0.01 

0.025 

0.05 

0.1 


N  is  sample  size,  d  is  dimension,  and  a  is  significance  level.  Critical  values 
are  listed  with  associated  standard  error  in  square  brackets.  Critical  regions  correspond  to 
values  of  KN  strictly  greater  than  the  appropriate  quantile.  Critical  values  are  computed 

by  100,000  simulations  for  each  case  of  sample  size  and  dimension  using  uniformly 
distributed  data  and  matching  with  respect  to  Euclidean  distance.  Standard  error  for 
quantiles  is  determined  by  the  Maritz-Jarrett  method  (Maritz  and  Jarrett,  1978). 
Simulation  suggests  that  these  critical  value  approximations  are  independent  of 
underlying  distribution  and  cost  function. 

Interpolate  to  find  critical  values  for  N ,  d  ,  or  a  not  provided  in  the  table.  Use 
d  =  50  to  approximate  critical  values  for  d  >  50 .  For  sample  size  or  dimension  far 
outside  the  bounds  of  these  tables,  critical  values  ought  to  be  approximated  by  simulation 
for  the  case  at  hand. 


d  =  1 

2 

3 

4 

5 

10 

20 

50 

2.88 

2.67 

2.50 

2.38 

2.29 

2.06 

1.98 

1.88 

[.06] 

[.03] 

[.03] 

[.03] 

[.03] 

[.02] 

[.02] 

[.02] 

2.15 

2.04 

1.96 

1.89 

1.84 

1.70 

1.65 

1.59 

[.02] 

[.02] 

[.01] 

[.01] 

[.01] 

[.01] 

[.01] 

[.01] 

1.86 

1.78 

1.72 

1.67 

1.63 

1.54 

1.50 

1.45 

[.01] 

[.01] 

[.01] 

[.01] 

[.01] 

[.01] 

[.01] 

[.01] 

1.49 

1.45 

1.41 

1.38 

1.36 

1.31 

1.29 

1.26 

[.01] 

[.01] 

[.01] 

[.01] 

[.01] 

[.01] 

[.01] 

KOI] 

1.21 

1.19 

1.18 

1.16 

1.15 

1.13 

1.11 

1.10 

[.01] 

[.01] 

[.01] 

[<.01] 

[<.01] 

[<.01] 

[<.01] 

[<.oi] 

0.95 

0.94 

0.94 

0.94 

0.93 

0.93 

0.93 

0.92 

[<.01] 

[<.01] 

[<.01] 

[<.01] 

[<.01] 

[<.01] 

[<.01] 

KOI] 

Table  20.  Approximate  critical  values  for  KN . 
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APPENDIX  C:  R  FUNCTIONS  FOR  CRITICAL  ENVELOPES 


In  this  appendix,  we  provide  R  (2005)  language  code  to  compute  critical 
envelopes  for  the  Non-Bipartite  Accumulated  Pairs  (NAP)  and  Bipartite  Accumulated 
Pairs  (BAP)  tests. 

A.  CRITICAL  ENVELOPE  FOR  THE  NAP  TEST 

function (alpha,  n) { 

#  alpha  =  non-simultaneous  alpha  value  (rejection  for 

#  exceeding  a  critical  threshold) 

#  n  =  number  of  pairs  (N  =  2*n  is  total  sample  size) 

#  Returned  is  a  list  of  critical  boundary  values,  and  the 

#  probability  of  violating  at  least  one  of  them.  Boundary 

#  values  themselves  are  not  critical  (e.g.  reject  the  null 

#  if  any  value  is  strictly  greater  than  the  boundary  value) . 
nl  <-  n  -  1 

N  <-  2  *  n 
Nl  <-  N  -  1 
qvec  <-  numeric (Nl) 
for  (k  in  2 : Nl )  { 

rmin  <-  max (c (0,  k  -  n) ) 
rvec  <-  rmin: floor (k/2) 

cvec  <-  cumsum (choose (n,  k  -  rvec)  *  choose (k  -  rvec,  rvec) 

*  2A (k-2  *  rvec )) /choose (N,  k) 
qvec[k]  <-  which (cvec  >  (1-alpha  -  le-010))  [1]  +  rmin-1 

} 

qvec  [  1 ]  <-  1 

kstar  <-  max (which (qvec  <  1 : Nl ) ) 
a  <-  rep  (0 ,  n) 
a  [  1 : 2 ]  <-  1 
qv  <-  qvec [2 ] 
for(k  in  3:kstar)  { 

aO  <-  a  *  ((0:nl)  <=  qv) 
qv  <-  qvec [k] 
qvl  <-  qv  +  1 

if  (qv  >  0)  a[2:qvl]  <-  a0[2:qvl]  -  (2  *  (dif f (aO [1 : qvl ] )  * 

( 1 : qv ) ) ) / k 

} 

rvec  <-  max(c(0,  kstar  -  n)):qv 

cvec  <-  (choose  (n,  kstar  -  rvec)  *  choose (kstar  -  rvec,  rvec)  * 

2A  (kstar  -  2  *  rvec)  ) /choose (N,  kstar) 
alpha. sim  <-  1  -  sum(a[rvec  +  1]  *  cvec) 

return (list (kseq  <-  2:N1,  envelope  =  qvec[-l],  alpha. sim  = 
alpha . sim) ) 

} 
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B. 


CRITICAL  ENVELOPE  FOR  THE  BAP  TEST 


function (alpha,  m,  n) { 

#  alpha  =  non-simultaneous  alpha  value  (rejection  for 

#  exceeding  a  critical  threshold) 

#  m  =  number  of  control  points 

#  n  =  number  of  test  points  (must  have  n  >  m) 

#  Returned  is  a  list  of  critical  boundary  values,  and  the 

#  probability  of  violating  at  least  one  of  them.  Boundary 

#  values  themselves  are  not  critical  (e.g.  reject  the  null 

#  if  any  value  is  strictly  greater  than  the  boundary  value) 
if (n  <=  m)  { 

cat("***  Invalid  arguments  ***",  "\n") 
return ( ) 

} 

qvec  <-  qhyper(l  -  alpha,  m,  n  -  m,  1: (n  -  1) ) 
sq  <-  which (diff (c (0,  qvec))  <  le-010) 

pvec  <-  ( ( (m  -  qvec[sq])/(n  -  sq  +  1))  *  dhyper (qvec [ sq] ,  m,  n 
m,  sq) ) /phyper (qvec [sq] ,  m,  n  -  m,  sq) 
return ( list (envelope  =  qvec,  alpha. sim  =  1  -  prod(l  -  pvec))) 

} 
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APPENDIX  D:  EXAMPLE  DATA  FOR  FIGURES  1-7 


xl 

x2 

0.8057 

0.2209 

-1.3556 

-1.0061 

0.1209 

-0.4531 

-0.2222 

1.3995 

0.5717 

-0.4620 

-0.3001 

0.0327 

1.1343 

0.7988 

-0.1794 

0.8968 

-1.4671 

0.1379 

1.3953 

-1.6191 

0.4408 

-1.6466 

0.5654 

0.4287 

-0.6936 

-0.7372 

0.8339 

0.5649 

-2.2374 

-1.3842 

1.0976 

0.4603 

-0.0016 

0.6294 

-1.6146 

0.3798 

-1.2287 

-1.0133 

0.2074 

-0.3472 

Table  21.  Example  data  for  Figures  1-7 
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